Location Data Analytics in the Business Value Chain: A Systematic Literature Review

Context information has become a significant asset to optimize the value obtained from information systems. Location is an important type of context information that refers to the place in which an event occurs. In business environments, the implementation of location-based analytics systems to aid decision making processes is of paramount importance for business development. However, after an exhaustive literature review, we found that researchers and practitioners still lack a comprehensive characterization of location-based data analytics systems that have been effectively applied to business processes. This paper presents the results of a systematic literature review (SLR), in which we characterized a total of 168 location-based and business oriented analytics solutions that were published between 2014 and 2019. To conduct this SLR we defined three characterization dimensions: business aspects, through which we identified value chain business processes or activities that may be benefited with the proposed solution; data source, which allowed us to report on the data used in each of the studies; and data analytics, through which we report on the analytics techniques and validation strategies implemented by the studied approaches. The contribution of our SLR is twofold. First, it provides business and data analytics practitioners with a comprehensive catalog of location-based data analytics approaches that could be applied to improve value generation, at different levels, along their businesses’ value chains. And second, it provides researchers with a complete landscape of recent advancements and open challenges in the field.


I. INTRODUCTION
Nowadays, because of the proliferation of new technologies and services that facilitate the collection of location data [1], there is plenty of availability of location information. Furthermore, the evolution of technology has facilitated the realization of operational work in several aspects of our lives. In particular, data is very useful to drive decision making in operations [2]. In the business context, researchers have demonstrated that location information can be exploited in favor of the achievement of operational goals [3], [4].
The value of location information, as an asset to improve decision support mechanisms, has been studied thoroughly in several problem types [5]. However, we found that The associate editor coordinating the review of this manuscript and approving it for publication was Xin Luo . researchers, specially practitioners interested in exploiting location context in the business domain, still lack comprehensive and domain-independent surveys, particularly systematic literature reviews, that connect common business needs with data analytics approaches, in which the usage of location information plays a major role. We found other eight surveys associated with business processes and location context, nevertheless, their results could be considered narrow since they focus on specific industries or analysis techniques.
With the goal of providing researchers and practitioners with a more comprehensive characterization of approaches that exploit location context and that are applicable to business processes, we surveyed 168 papers by following the guidelines proposed by Kitchenham et al. [6]. Our survey was conducted to answer five research questions that are VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ relevant to researchers and practitioners interested in this research field: i) what are common use cases using location information to generate business value? ii) what analytics techniques have been used to generate business value using location context? iii) what have been the primary sources for data extraction, and what kind of information was used? iv) how location information has been exploited to generate business value?
And v) what metrics have been used to evaluate and validate the models? The findings obtained from the papers characterized in our SLR are classified into three groups: i) business process findings, which refer to business process aspects that were addressed using the studied approaches; ii) data source findings, which refer to the primary data sources that were exploited in the studied solutions; and iii) data analysis findings, which report on the techniques, metrics, and validation strategies used or proposed by the analyzed studies. To characterize business process aspects, we revisited the value chain concept proposed by Porter [7].
Findings suggest that there is an increasing interest in this field. Regarding business process aspects, there are applications in all processes of the value chain, particularly in operational processes. Data source findings show that most of the studies use proprietary and surveyed data sets, although there are several studies using location-based social networks (LBSN). Finally, we collected and organized the data analysis findings to easily observe the problem types, data sources, and analysis techniques used by the authors of the surveyed studies. Location analysis, in particular applied to the facility location problem, was the most common approach among the studies.
This paper is structured as follows. Section II explains the background of this SLR. Section III discusses related work by analyzing the contributions of our SLR with respect to other similar surveys in the area. Section IV explains the methodology we followed to conduct this review. Sections V-VII present the findings and contributions of this work. Section VIII summarizes limitations and research opportunities we identified during the elaboration of this SLR. Finally, Section IX concludes the paper.

II. BACKGROUND
This section presents foundational concepts that are relevant to the application of location analytics to business value chain challenges.

A. THE BUSINESS VALUE CHAIN
The concept of business value chain was introduced by Porter in 1985 [7]. In his book, Porter describes the business value chain as the collection of activities a company performs in order to design, produce, market, deliver and support products. He also defines a set of primary activities that can be categorized into five groups: • Inbound logistics: Activities related to receiving, sorting, handling and managing the inputs or raw material the company needs in order to create its products.
• Operations: Activities associated with the transformation of the inputs into the final product, including design, prototyping, testing, and fabrication.
• Outbound logistics: Activities associated with the collection, storage, management and distribution of the final product to the customers.
• Marketing & sales: Activities related to the promotion, communication, sales force and everything a company does to enable customers to purchase products.
• Service: Activities that are related to service providing, maintenance, adjustments, and support necessary to enhance or sustain the life of the product or acquisition. The business value chain concept has matured well in the business domain, and it is actively used as a reference to elaborate business strategies to attain competitive advantage. It is also compared with other definitions, such as core process, and it is used as the basis for other business strategy concepts [8]. In this SLR, the analysis of the business dimension is based on Porter's business value chain concept.

B. LOCATION CONTEXT
According to Abowd et al., context is ''any information useful to characterize the situation of an entity (e.g., a user or an item) that can affect the way users interact with systems'' [9]. In general, context information has the potential to improve the quality of results of data-oriented solutions.
There are several categories of context information, Villegas and Müller classify them as Individual, Location, Time, Activity, and Relational. Furthermore, they define location context as ''all the information about the place of settlement or activity of an object'' [10]. Location context is the subject of study in this paper.
In recent years, location context has been widely used in different industry domains such as biofuel [11], [12], transportation [13], [14], tourism [15], [16], and retail [17], [18], among others. This wide interest in location context demonstrates that the positional relationship that may exist among entities is relevant to imply other information that affects businesses and customers. Hence, investigating new methods and techniques to exploit location context in the business realm is not only relevant, but also generates a significant impact [19].
Traditional statistical analysis can be applied accordingly to spatial analysis, e.g., a researcher would want to look at the spatial distribution of the studied instances, and perform experiments. Moreover, location context can play an additional role. For example, two frequency distributions, numerically speaking, can be equivalent, however, the way they are located in the space can change how the data is interpreted. Fig. 1 shows this phenomenon graphically. In both images, we observe the same number of instances in the same number of squares, however, they have different spatial distributions. The spatial relationship between instances can significantly contribute to the discovery of latent information that might reveal additional insights of significant value for the use case.

C. LOCATION CONTEXT PROBLEM TYPES
Based on the findings of our SLR, the following can be considered the most common location context problem types: • Location analysis: Location analysis focuses on selecting the optimal location for a particular goal [20]. It includes the facility location and coverage problems. Actual applications may vary depending on the industry domain, e.g., selecting optimal locations for a factory or designing transportation routes to optimize service coverage [4].
• Spatial analysis: Spatial analysis extends traditional statistical analysis by including the spatial components of the data, such as the spatial relationship between instances [21]. There are several sub-types of analysis depending on the needs and context. For example, pattern identification, clustering, and geographic relationship analysis.
• Recommendation systems: Recommendation Systems emerged as a response to a question that is relevant to any industry sector: What are the most appropriate items for a particular user? There are several approaches to solve this problem, and some include context information.
Recommender systems that rely on context information to improve their recommendation models are known as context-aware recommendation systems [22].
• Point-of-interest (POI) recommendation: This refers to a specific type of recommendation system where the item to recommend is a location itself. A POI is a specific place a user might find useful or interesting. Several applications have emerged from the need of recommending places (POIs) to users, for example Foursquare, TripAdvisor, Yelp, and Google Maps. As a result, in recent years, POI Recommendation has gained considerable popularity as a research field. We consider this as a separate problem type given its significance to the findings of our SLR.
• Behavioral pattern detection: Location context is exploited to detect or infer behavioral patterns among elements of a system. One common use case is exploiting the movement behavior of users to improve mobile services [23].
• Predictive analysis: This problem type is analogous to predictive analytics or supervised learning. The idea is to model data, entities or events, and predict values for new instances. The use cases are significantly diverse.
In what respects to location context, some examples are predicting the best business type for a particular location [17], predicting business performance according to location and social context [24], and predicting the quality of mobile services according to user locations [25].
• Trajectory tracking: Trajectory data from entities is collected to be used as input to analyze movement and behavioral patterns. The goal is to learn movement patterns in conjunction with other variables, to build a model capable of predicting the next location for an entity or model routing patterns. Imai et al. [26] suggest that early destination prediction helps improve personalized services.
• Location based relationships: Using location data, it is possible to model similarities between two entities, as there might exist a latent relationship between them depending on how they move or the places they frequent [27], [28]. User similarity can be used for POI, users, or other recommendation types.
• Location-aware advertising: Location-aware advertising is a marketing technique aimed to do better-targeted advertising, i.e., to send ads related to a particular product or service to locations where there seems to be people more likely to engage with it. Ubiquitous devices are a crucial tool given their location sensing capabilities [29].
• Matching: Matching is an ephemeral association established between entities based on their current context and state, e.g., pairing two different entities where the first one has a request that must be satisfied by the second one. There must be a criteria that can match them together, and this relationship can last until the request is satisfied. Matching is common, for example, among services that depend on a geographical location like ride-sharing or transportation [14].
• Location estimation: Location estimation is the action of inferring the position of an entity based on other information about it. In the context of location-based social networks (LBSN), this is useful because, despite the importance of location context, a significant amount of data still does not include it [30]. Some possible reasons are: users choose not to share their location, or the samples are not collected with this attribute. The lack of geotagged samples create the need to identify the latent location from the available data [31].

D. DATA ANALYTICS
Data analytics is a common approach to address the challenges of the location context problem types presented before.
The following are the types of data analytics identified in the papers studied in this SLR: • Non-predictive approaches: Non-predictive approaches are commonly based on descriptive or unsupervised learning. Descriptive analysis approaches, based on the concept of descriptive statistics [32], seek to describe and summarize the main characteristics in a data set, with the aid of statistical and mathematical modeling, and present the results in the form of tables, charts, and graphs, among other visualization options. In machine learning approaches, other unsupervised learning techniques can be used to perform numerical analysis and discover insights from the data [33].
• Predictive approaches: Predictive approaches are associated with predictive analytics or supervised learning. In these cases, data sets contain the target values to predict, and an algorithm is used to ''learn'' numeric relationships between the attributes and the expected value [33]. The prediction of continuous numeric values is called regression [2], [34], and the prediction of a class or category is called classification [2].
• Inference: Whereas machine learning is generally more focused on prediction, statistics is more concerned with inference, i.e., it focuses on how data is generated as a function of the input data [35], and on the relationships between the independent and the dependent features.
• Optimization: Mathematical or numerical optimization, according to Nocedal and Wright [36], is the process of numerically analyzing a physical system to formulate an objective f (X n ) that depends on certain characteristics of the system. The objective should be something to improve or optimize e.g., maximize profit or minimize costs. The system characteristics are variables for which their optimal values are yet unknown, and the objective is often subject to constraints, i.e., the optimal solution should also meet a set of particular numeric constrains [36]. Findings suggest that most solutions to the facility location problem are addressed as mathematical optimization problems.

III. RELATED WORK
We found eight surveys published in relevant journals since 2010. These surveys consider location context as a primary aspect to analyze together with at least a particular aspect of a business or system. We did not consider reviews in which location context was a secondary attribute, since our main focus is precisely the exploitation of this information in favor of the results.
Aiming at a comprehensive understanding of the use cases, applications and techniques, our SLR is an industry & problem type independent characterization of location context usage in studies or solutions focused on the business value chain. We characterized the surveyed papers in light of three dimensions: business process, data sources, and data analysis. Table 1 compares our SLR (last row) with the eight relevant location-related surveys found in the state of the art. This comparison is based on six criteria that we define as follows: i) SLR, the survey follows a systematic literature review [6] approach as methodology; ii) Not focused on a specific industry domain, the survey scope is not tied to a specific industry or sector, e.g., the study does not focus solely on the hotel industry; iii) Not focused on a specific problem type, the survey scope is not tied to a specific problem type, e.g., the study is not focused solely on optimization problems; iv) Business process findings, the survey includes business level characterization dimensions based on accepted business or operation strategy models; v) Data source findings, the survey contains data sources or data characterization dimensions; vi) Data analysis findings, the study elaborates on the solution approaches followed by the surveyed papers. The check mark ( ) in the corresponding cell indicates the study is compliant with that given criterion.
According to Table 1, four surveys focus on particular industry domains: Hotels & Tourism [37], [41]; multinational enterprises [39]; and social media [40]. Two reviews work on specific problem types: Recommendation systems [37], and Optimization [38]. Only three studies dedicated a section to elaborate on business details: Lim et al. [37] include dimensions to study operation aspects of tourism businesses; Nielsen et al. [39] elaborate on business processes of multinational enterprises (MNEs); and Yang et al. [41] elaborate on hotel industry business details. Finally, Keenan and Jankowski [5] survey over a considerable period of 30 years of spatial decision support systems, but they focus on building networks and publication relationships among the authors of the surveyed studies. None of the studies considered general and domain-independent business processes or activity aspects nor general technique usage.

IV. METHODOLOGICAL ASPECTS
This SLR followed the methodology suggested by Kitchenham et al. [6]. Our goal was to understand how location context is being used to enhance analytics in business areas. For this, we defined the following research questions: • RQ1: What are common use cases using location information to generate business value?
• RQ2: What analytics techniques have been used to generate business value using location context?
• RQ3: What have been the primary sources for data extraction, and what kind of information was used?
• RQ4: How location information has been used to generate business value?
• RQ5: What metrics have been used to evaluate and validate the models? We conducted a bibliographic search on journal papers published in ACM, IEEE, Scopus, ScienceDirect, and SpringerLink. These were selected because of the affiliation with the topic and the acceptable quality of their publications. The search string used was: (

''location analytics'' OR ''location analysis'' OR ''location context'' OR ''location intelligence'') AND (consumer OR customer OR client OR retail OR sales OR marketing OR logistics OR operation OR service).
The inclusion criteria consisted of: i) publication date, we selected papers published between 2014 and 2019; ii) publication type, we considered only Journal articles; iii) the number of pages and language, we excluded publications with less than eight pages and written in non-english languages; iv) relevance, we studied article titles and abstracts and verified the relevance to our research questions. Additionally, when the database offered the option to filter by area, we considered the following: engineering, computer science, environmental sciences, decision sciences, business and management, earth sciences, economics and finances. When the database did not offered this option, we inspected the results manually and considered only articles within the mentioned areas. Fig. 2 shows the look-up and filtering process. The first search resulted in 3,223 articles. Due to query limitations with the logic connectors in ScienceDirect, we had to split the query into two parts, which generated some duplicates that we had to exclude manually. While inspecting the content of the articles, we excluded 2,848 during the first screening, then another 207 due to lack of relevance, or because it was not possible to identify the information required by our characterization dimensions.
Since we strictly focused on location context in business processes, we defined an additional set of inclusion and exclusion criteria regarding the content and purpose of each study. In short, we looked up for studies where location context played a mayor role in solving business process challenges using analytics techniques. Furthermore, we excluded studies focused on environmental settings such as weather, traffic, natural disasters, or waste control since we do not consider these cases to be easily traceable as business processes. Similarly, we excluded studies related to human nature aspects such as culture, crime, and society since we consider these cases beyond of our scope. Finally, we excluded pure theoretical research work, particularly those focused on new mathematical methods and proofs without demonstrating their applicability. At the end, we got a total of 168 relevant articles that we characterized comprehensively.

V. BUSINESS PROCESS FINDINGS
This section summarizes the findings related to the contributions of the studied approaches to the business processes introduced in Section II-A, along with others aspects that we considered relevant to answer RQ1.
We observed a diverse set of industries exploiting location information in different problem types. Fig. 3 presents a cross between the business value chain processes and the problem types described in Section II. We observe the prevalence of location analysis and spatial analysis in different industries and segments. Also, we observe that with respect to logistics, there are only location and spatial analysis applications. Finally, the set of addressed problem types was diverse along different industries and value chain processes.
Next, we discuss our findings in light of three subdimensions: i) Industry domain, by identifying to what industry domain the study was being targeted; ii) Business value chain, by identifying and classifying the studied approaches with respect to the impact on the value chain processes; and iii) Decision impact, by identifying, at a high level, in terms of time and cost, the impact of the model or study output, and whether it was used to support strategic decisions or as an end user functionality.

A. INDUSTRY DOMAIN FINDINGS
We define industry domain as the particular economic activities or business types for which a study was applied to. We found 23 different industry domains (cf. Table 2). Within the element named Other, we included domains for which only one study was identified. These include: aquaculture, mining, music, news, food, public infrastructure, entertainment, and construction. Table 2 also shows the proportion of papers per industry domain. The predominant category is ''Not specific'' which represents 23.81% of the studied papers. This group corresponds to those that were not specific in solving a unique business case, but their setting, findings and results are connected to real business problems and were validated with realistic data. One example of this category is the work of Önden [44], which elaborates on the single facility location problem in the context of logistics activities in Istanbul, Turkey. The study focuses on the inclusion of Geographic Information Systems (GIS) to location selection processes. Another interesting case is the work by Xin et al. [23], which elaborates on recommendation systems that exploit user mobile traces.
Biofuel, transportation, and health-care industries occasionally share interests in using location information in their respective use cases. Biofuel generally focuses on facility location problems, i.e., to find the best location to install refineries and processing facilities. Transportation, including cargo, also focuses on locating logistics stations such as bike sharing stations [13], [95]. Other studies focus on urban transportation networks [97], [100] and ride sharing [14], [103]. Health-care focuses mostly on designing attention center networks to optimize user accessibility [107], [108].
In retail, we found a diverse set of approaches, several of them focused on analyzing market behavior, prices and economic conurbations [121], [125], [127]. Yu et al. [17] exploit location information for shop type recommendation, i.e., given a particular location, recommend the best shop type to open there. In Tourism, we found several approaches in hospitality that address both the company and market performance standpoints. For example, Gutiérrez et al. [134] apply spatial analysis to assess the impact of Airbnb in Barcelona.

B. BUSINESS VALUE CHAIN FINDINGS
Certain approaches were applied to more than one of the business value chain processes presented in Section II-A, hence, several studies were assigned to more than one process. The categorization was a manual inspection task that implied understanding the situation and analyzing it in light of the processes of the value chain.
As shown in Table 3, we found that the majority of studies relate to operation processes. For instance, Guaita-Pradas et al. [161] studied the sustainable development of solar power in Valencia, Spain, with the purpose of analyzing geographical regions in terms of their solar radiation impact and the viability of locating photo voltaic panels in that geographic region. Another example is the study by Huang et al. [105], which focuses on solutions for taxi-related services.
Service was the second most addressed value chain process. We found diverse applications on public transportation services [97], [106], cargo transportation and delivery [102], as well as IT services, in particular to predict the Quality of Service for Web services [173].
Marketing & sales approaches were more focused on customers, i.e., exploiting user information to improve communication and advertising campaigns. For example, by exploiting user generated content that includes geographical information to improve traveling marketing decisions [15] and recommend POI [79].

C. DECISION IMPACT FINDINGS
We found a total of 138 studies from which we could extract the impact of the proposed solutions on the decision making process for the corresponding business. We characterized this impact using two sub-dimensions: i) Decision impact, to assess, at a high level, the impact of the decision in terms of time and cost; ii) Decision type, to assess whether the solution was used to support the business strategy, or to implement a service for end users.
To characterize the impact of the decisions made with the surveyed approaches, we defined three categories: • Long term: The decision has a high economical and/or timely impact, i.e., the decision implies a significant investment, and/or is expected to last from months to years. For example, building a new facility, making a significant investment, or defining a time consuming operation.  • Mid term: The decision has a moderate economical and/or timely impact, i.e., a moderate investment decision that is expected to last from days to months. For example, changing distribution routes, or adjusting seasonal parameters.
• Short term: The decision has a small economical and/or timely impact, i.e., the decision is made immediately, has low individual cost, and is expected to last from minutes to days. For example, a user accepting an item recommendation. Decisions made with the surveyed approaches were characterized through the following values: • Strategic: It refers to a business strategic decision that is taken internally.
• End user: It refers to a functionality provided to end users. The majority of studies (40.8%) were related to long term decisions. For example, all facility location use cases correspond to this category since there are high costs and expectations associated with new facilities. Mid term decisions constitute 28.8% of the papers, which includes studies associated with travel route planning [15], [70], [141], [142]; prediction of event participants [189]; as well as resource optimization and/or calibration during operation [65], [167], [174], [178]. Finally, short term decisions constitute 30.4% of the studies, where we found recommendation systems applied to telecommunications [163] and logistics [148], [170], among others.
With respect to decision type, the majority of the surveyed studies focus on strategic business decisions (81.5%), such as facility location problems or distribution chain planing. Industries such as biofuel, fuel, health-care, retail, energy, and tourism show great interest in this area. For example, in retail, selecting a shop type given a particular location is an internal strategic decision [17]. Regarding end user oriented functionalities (18.5%), we found diverse applications such as carpooling recommendation systems [103], event organization planning [188], and travel marketing decision planning [15].
Tables 4 and 5 classify the surveyed approaches according to their decision impact and decision type, respectively.

VI. DATA COMPONENT FINDINGS
To answer RQ3, we analyzed the data information disclosed in the surveyed studies, and documented which were the most common data sources. Table 6 summarizes data source findings. We found that 48.81% of the studies reported usage of proprietary data sources, whereas 9.52% of the studied papers used surveyed data sources, which were collected by researchers through surveys or computing programs. Another category is synthetic data sources, which correspond to data generated by 204646 VOLUME 8, 2020 the researchers. Undisclosed data sources, corresponding to 7.74% of the surveyed papers, are those that were not explicitly mentioned by the authors.
Several other studies declared to use LBSN data such as Twitter, Gowalla, Yelp, Dianping, Baidu, Flickr, Foursquare and Tripadvisor. The studies using LBSN data cite the pervasiveness, volume, and information quality as significant factors for their usage.
Other data sources correspond to dedicated location information providers such as the Environmental Systems Research Institute (ESRI 1 ), Openstreetmap, 2 and Google maps/earth. ESRI, for example, has a strong presence in business systems. Platforms, such as ArcGIS, 3 facilitate spatial analysis.
The nature of the variables included in the data sets is very diverse. In facility location problems, particularly in fuel refineries, the environmental aspects are crucial to choose where to extract the natural resources from, and the implications of the logistics costs of the selected location [80], [81], [83], [84]. In Tourism, environmental features can affect business dynamics and user preferences, e.g., prices, distances, and landmarks. [16], [135], [137], [138].
Use cases involving business and operational data range from exploiting city dynamics for car-sharing solutions [14] to planning distribution networks and hub-port locations [96], [144], [145]. Retail is a similar case where business data is useful for market and customer analysis [121]- [123], [127].
Demographic data is widely used in solutions dealing with customers from the market analysis perspective, which is expected. However, it was also used in other applications such as health-care facility and equipment location problems [110], [111], [113], as well as urban and transportation analysis [13], [95], [98], [101]. User and item information, usually accompanied with social data, are used as the natural definition of entities in recommendation systems in different applications such as coupon recommendations [131], and location-aware news feeds [195].
As expected, economy data was mostly used in business performance related approaches. Some studies focused on external factors, such is the case of Teng et al. [51], who analyzed the performance of multinational enterprise (MNE) subsidiaries using country wide economic indicators [171].
In terms of geographic scope, only two studies reported usage of data originated from worldwide. An et al. [63] studied audiences and customer segmentation for online platforms using data generated from YouTube. The other case was the approach by Liu et al. [173], which used the WS-Dream 4 data set that contains web service log requests labeled with different locations for a quality of service prediction and service recommendation system. Both cases use a discrete representation of location information, i.e., the location was just a name or label referenced from where the event occurred.
Similarly, only two studies reported to work within a single venue. East et al. [133] combined GPS traces with survey data from customers visiting the Marwell Zoo in Hampshire, UK, in an interesting approach to discover the behavioral patterns of customers within the installations of the Zoo. The other case was Cheng and Shen [194], who worked on the concept of venue aware recommender systems for music, and used a venue-labeled music data set in which songs were tagged with a venue type such as gym, restaurant, and library.

VII. DATA ANALYSIS FINDINGS
This section summarizes the findings related to the data analysis approaches applied by the authors in the surveyed papers.
In order to answer research questions RQ2, RQ4, and RQ5, we characterized these findings in four sub-dimensions, i) problem type, a categorization of the studies by the problem types defined in Section II-C; ii) techniques, a summary of the techniques used by the researchers to solve the addressed problems; iii) metrics and validation strategies, a summary of the metrics and validation strategies reported in the studies to validate their findings; and iv) location information usage, a classification of the studies according to the role played by location context. Fig. 4 presents a taxonomy of the data analysis findings. Columns refer to the four data analysis types, and rows correspond to the problem types. This information should be read as the frequently used techniques and data sources for a given cross between a data analysis type and an problem type. For example, we observed that approaches that were applied to location analysis (problem type) were based on optimization (analysis type), and the frequently used techniques were mathematical optimization, genetic algorithms, matrix factorization, and regression.
Findings show that predictive analysis has implementations in almost all of the problem types with a wide variety of techniques and data sources. This suggests that the interest in exploiting location context in the business realm is an interesting research topic and it is attracting researchers with diverse backgrounds.
Another fascinating aspect is the usage of LBSN data in several problems and analysis types. Researchers have been showing interest in exploiting LBSN information given their rich geotagged data options that they can offer. Further details about data analysis findings are presented in the following subsections. Table 7 summarizes problem type findings. We found that 50.6% of the studies fall into the location analysis problem type. Within location analysis, the facility location problem stood out as an important problem in different contexts. There is high interest in researching about the optimal location to install new facilities, according to industry and business needs, by considering the sustainability and economic viability of the locations as decisive investment factors [11], [12], [80], [87], [91].
The second most addressed problem type was spatial analysis, which represents 32.74% of the papers. All the studies that performed some level of descriptive analysis, inference, or observation of spatial properties of events used spatial statistics techniques to study particular phenomena by exploiting location context. Some cases focused purely on analyzing variables related to location context. For example, analyzing hotel room price variations according to location context [135], as well as identifying retail centers and customer interactions according to their intra-market movement patterns [18], [125], [127].
Spatial analysis was also a complementary tool in other cases where a clear business objective was the target of the study. For instance, North and Miller [198] first studied the geographical characteristics of a particular region in Bavaria, Germany, and then formulated a location analysis problem to deploy an entertainment troop seeking at maximizing audience. Another example is Yang et al. [191] who studied the geographical 204648 VOLUME 8, 2020 planing of restaurants in US, based on socio-demographic characteristics.
POI recommendation has received an increasing attention in recent years because of the availability of fine grained geotagged data, and the adoption of GPS enabled devices and LBSN [66], [72], along with opportunities derived from activities such as tourism, leisure, restaurants, retail, among other location related ones. POI recommendation was the third most frequent problem type, representing 10.12% of the surveyed studies.
As for context-aware recommendation systems, we found 16 papers, and most of them focused on the technical aspects of the problem type. However, several studies focused on specific business applications for recommending, for example, shop-types [17], [129], web or mobile services [162], [173], news feeds [195], social events and gatherings [188], carpooling [103], and music [194].
Under predictive analysis we classified studies that exploit location context to improve particular tasks such as business or service performance prediction [24], [62], and to detect anomalies in services [58], [178], among other applications. Trajectory tracking applications include some level of observation of the routes and movements of the entities of the system, either to be used as an attribute of the study or the actual study subject. Trajectory tracking was common in transportation [101], [103]- [105], tourism [139], and advertising [185], [187].
Under the matching problem type, we found use cases in transportation, more specifically, applications to optimize the match between passengers and transporters [14], [105]. Also, in telecommunications, we found approaches exploiting location context to optimize the association between mobile users and cellular networks [165]. Similarly, we found approaches that exploit location for matching participants in social events [188], [189].
In location-aware advertising, aside from the two cases that also involve trajectory tracking, and location-aware news feeds, we found applications exploiting Twitter data to enable targeted advertising in cities [186], and coupon recommendations, which also exploit social context [131].
With respect to the location-based relationship problem type, we found applications in the construction of communities [28], long lived associations between users consuming services [165], [173], and discovering of retail areas according to spatial relationships [130].
Finally, we found only one instance of location estimation: Ahmed et al. [164] studied the estimation of cellular network degradation based on service performance.

B. TECHNIQUE FINDINGS
We found a rich variety of techniques among the surveyed papers, because each study elaborated thoroughly in implementation details. As a result, we only focused on characterizing the most common ones.   5 presents the data analysis types used in the surveyed papers. We found that most of the studies used descriptive analysis or unsupervised learning representing 39.88% of the approaches. The second most used approach was optimization, representing 38.10% of the studies. The third one was predictive analysis with 19.71%. Finally, inference analysis with 17.86%.
The two most frequent techniques were mathematical optimization, and spatial analysis, representing 36.9% and 34.52% respectively. Mathematical optimization is commonly used in location analysis problems such as facility location and spatial coverage problems. Spatial analysis is used to analyze and describe scenarios with geographical data.
Recommendation systems was the third most frequent technique representing 14.88% of the papers. Similar to location analysis, recommender systems is actually a complete research field, and not necessarily a single technique. However for simplicity, we grouped all the studies that cited the use of recommendation system techniques, such as collaborative-filtering, content-based, or hybrid techniques.
Matrix factorization is also a common tool to solve recommendation system problems. Nevertheless, we considered matrix factorization separately because it can be applied in other contexts such as predictive analysis [62], [189] or pattern detection [63]. Also, there are different methods to factorize a matrix, hence we group all of them together under this category.
Clustering techniques are a conventional tool to describe or discover patterns in geotagged entities or events. The fact that geographic information is commonly encoded in cartesian coordinates makes it a suitable candidate for clustering analysis. For example, Arbia et al. [116] implemented spatio-temporal clustering techniques to explore geographical aspects of medical device manufacturing industries. And Lloyd and Cheshire [125] employed clustering and KDE techniques to discover geographical center locations for retailing.
Artificial neural networks (ANN) have gained considerable attention in the research community since in several cases have demonstrated superior performance than traditional machine learning methods. According to our findings, there are few studies using ANN compared with the rest of the techniques, (only 4.76%). Nevertheless, all the studies that implemented an ANN-based solution focused on different industry and problem types, demonstrating valid solution approaches to their respective settings. In particular, ANN was used for location analysis problems in the hotel [138] and solar farm industries [159], as well as for context-aware recommendation systems [57], location-wise anomaly detection [58], social event recommendations [188], and location-aware advertising [187].
Under Other techniques we grouped the ones that appeared three or less times in the surveyed papers. Among these cases we found Latent Dirichlet Allocation [74], [75], [194], Natural Language Processing [15], [105], and Bayesian models [72], [74], [197]. Table 8 presents the characterization of the studies according to the techniques identified in our SLR.

C. METRIC AND VALIDATION FINDINGS
Depending on the problem type and techniques, there are several suitable metrics and validation protocols. In this section we summarize some aspects that we considered relevant with respect to the validation performed for the surveyed approaches.

1) METRIC FINDINGS
We were able to identify metrics in 69 of the surveyed papers. Among the different metrics reported, we categorized and documented the most frequent ones, i.e., metrics that appeared seven or more times among the surveyed studies. The findings are summarized in Table 9. The majority of the reported metrics are associated with predictive analysis approaches, whereas descriptive analysis and inference related studies use statistical measurements such as standard error, p-value, or chi-square tests. Mathematical optimization approaches seldom report metrics usage since these methods work differently than statistical or machine learning-based methods, e.g., maximizing an objective function subject to some constraints yields a single value.
In predictive analysis or supervised learning, the usual approach consists in, as the name implies, predicting a target value using other features that are associated with it. Depending on the model objective, there are some metrics that are more suitable than others. We identified the following metrics: • Accuracy: It is the fraction of predictions correctly classified by the model among all the number of samples.
• Precision: Also known as true positive rate, it measures the proportion of positive classes against the total positives predicted by the model.
• Recall: Also known as sensitivity, it measures the proportion of positive classes against the total samples that should have been predicted as positive by the model.
• F1-score: Also known as the harmonic mean between precision and recall, it measures the model accuracy by considering both precision and recall.
• RMSE: The root mean squared error is a loss function consisting in calculating the squared root of the averaged squared loss per sample.
• MAE: The mean absolute error is a loss function calculated by averaging the absolute errors between the observed value and the predicted by the model.
• Coverage: It represents the amount of area covered given a certain objective and parameters, e.g., how much area a delivery service can cover using a certain configuration. In certain sub-categories of location analysis problems, it is defined as an objective to maximize.
• Standard error: It is a statistical measure of the difference between the mean of a sample and the total population. VOLUME 8, 2020

2) VALIDATION STRATEGY FINDINGS
Validation strategies define the procedures or protocols followed to evaluate a data analysis solution or experiment. Findings are presented in Table 10. In supervised learning approaches, it is common to partition the original data set into training and testing subsets. The former is used for model training, whereas the later is used to verify how the model behaves with unobserved data. However, the validation strategy to follow often depends on the data analysis type and the techniques used. For example, in mathematical optimization, the optimal solution depends on the set of variables and values used, and partitioning the data could 204652 VOLUME 8, 2020 We were able to identify a validation strategy in 150 of the surveyed papers. The validation strategies identified in the studied papers are characterized as follows: • Adhoc: This strategy consists in experimenting with the designed and implemented solution by testing it against a real data set or setting, to evaluate how it behaves and whether it delivers the expected results. For example, optimization problems are often evaluated against a real setting.
• Adhoc-descriptive: This strategy follows the same concept as Adhoc, however the application is in descriptive settings like spatial analysis. We separated them because the analysis types and techniques are different.
• Holdout: This strategy consists in splitting the data set in two sets: training and testing, usually 70% and 30% respectively. However, the proportions may vary depending on the setting.
• K-fold cross-validation: This is a more sophisticated strategy that consists in partitioning the data set in k equally sized groups called folds, then iterating over each fold to perform training and testing phases, using a different test subset in every iteration.
• Other: We grouped under this category validation strategies with usage frequency of two or less. We included here: Student's t-test, Hypothesis testing, Binomial test, and Pearson's chi-squared tests.

D. LOCATION INFORMATION USAGE FINDINGS
This section summarizes the findings related to the usage of location information. We categorized the studies into two classes: i) Model output, which implies that the expected result of the model is a location or place; and ii) Model parameter, which means that the location information was used as an additional parameter to analyze or improve the performance of a model with other purposes. Table 11 presents the findings about the usage of location information. We found that 55.36% of the approaches used location context data as the expected result, whereas the other 45.24% used it as a model parameter. Some studies used location information as both model parameter and output. For example, Xu and Chow [195] exploited location information to recommend news feeds to users based on their current locations, and included a mechanism to predict the next location.
With respect to location information as model output, location analysis and POI recommendation are clear examples of problem types that focus on delivering a location as a result. Another problem type that used location information as result or subject of study was spatial analysis.
Among the studies that used location information as model parameter, we found problem types such as predictive analysis. For instance, Bojesen et al. [92] studied approaches to forecast the biogas production in Denmark; and Zhao et al. [62] exploited location context in LBSN to predict user ratings on certain items.

VIII. LIMITATIONS & RESEARCH OPPORTUNITIES
This section summarizes the limitations we faced during the elaboration of this SLR, as well as relevant research opportunities we identified.

A. LIMITATIONS
We encountered some limitations while elaborating this review, particularly with the depth and scope of the information we were able to cover. Most of the characterization was the product of manual inspection and interpretation of the study content. For some studies, it was difficult to derive certain characterization dimensions due to the lack of clearness in the scenarios the researchers described. Furthermore, given the significant variety of approaches and settings, elaborating on more in-depth details would have made this survey excessively extensive. As for depth, the most significant limitations were as follows: • Given that it is not common to explicitly state business value chain aspects in research papers, the business component dimensions were mostly inferred according to our understanding of the individual cases. Hence, some elements might be limited.
• Given the plethora of solutions presented by the researchers, we had to limit the amount of information to be included in the final report, otherwise the survey would have been too extensive. This also limits the amount of detail certain researchers and practitioners interested in particular solutions can gather from our survey. However, the characterization tables we obtained VOLUME 8, 2020 from our SLR are comprehensive enough to guide readers interested in getting further details from the surveyed papers.
With respect to the scope of this SLR: • We did not include indoor-location studies mainly because the settings differ considerably from outdoor approaches. We recognize the contributions made in this area, and similar surveys could be conducted specifically for indoor locations.
• We acknowledge the importance of privacy, particularly with respect to location context in settings where information about individuals is collected. However, we did not consider privacy implications since we were focused on understanding location analytics.

B. RESEARCH OPPORTUNITIES
We observed an increasing interest in this topic in the research community. This might be the product of the surge and availability of geotagged information from different sources that enable researchers to explore scenarios that were not possible or too difficult before. Borrowing from some of the surveyed papers' future work and research opportunities, in the context of exploiting LBSN for context-aware recommendation systems, Korakakis et al. [70] mention the inclusion of semantic analysis in the geo-clustering process, e.g., natural language processing techniques. Similarly, in the context of targeted advertising, Anagnostopoulos et al. [186] suggest exploring the trajectories of users when designing physical ad campaigns.
In the context of location analysis problems, particularly those that are based on mathematical optimization approaches, since the cases and scenarios are specific, some researchers suggest that their solutions can be tried in other similar industries, and/or can include additional features such as time, and other location-based properties [143], [152], [175], [198].
Other studies reported satisfactory results with their presented solutions and pointed in the direction of validating those approaches with other real-world data sets [24], [172], and exploring deeply other attributes associated with geotagged events [23], [62]. Furthermore, other authors plan to optimize the training of their models, and/or explore other advanced techniques such as deep learning [67], [140]. Others show interest in combining additional information associated with the events that are being studied besides the location data [17], [74].
Additionally, aside from the future directions reported by the authors of the surveyed papers, we observe opportunities in the location-aware advertising field. For example, exploring user generated content from LBSN to perform geographic market segmentation. Considering that a recurrent problem in marketing is the low effectiveness of advertising campaigns compared to the invested budget, solutions tailored to optimize the effort and resources needed to run marketing campaigns can be of utmost interest for that industry. Similarly, researchers can pursue the elaboration of other surveys that focus on particular aspects such as a single industry or a single problem type. The focus on a single element would allow researchers to expand more on internal details that can improve the visibility of the state of the art.

IX. CONCLUSION
This paper presented a comprehensive characterization of the application of location data analytics in the business value chain, based on the findings of a SLR that we conducted on papers published between 2014 and 2019. This SLR characterizes recent state of the art for several industries and problem types, in which location context is exploited, along with the techniques and data sources that are commonly employed. Findings were organized according to business concepts that are easy to understand, which should help practitioners identify paths to solve existing real life challenges, or researchers discover new paths that lead to investigate unexplored areas.
This study was conducted with the goal of mapping the different settings in which location context and data analytics can be effectively used, and exploited in favor of concrete business value chain processes, additionally, to help researchers and practitioners identify recent approaches and research opportunities. The main results provide an understandable and systematic characterization of the industries in which location data analytics is being exploited in specific business segments, as well as general understanding of the data sources, techniques, and experiment protocols used to implement the solutions. This way, this SLR allowed us to systematically answer the research questions that motivated it.
In spite of the scope of this study, and given the significant diversity of approaches and applications, it is unfeasible to derive definitive conclusions about the approaches, effectiveness, and peculiarities of every aspect of the settings we have found. One aspect that hampers a comprehensive generalization is the differential nature among the approaches that makes them difficult to compare. Nevertheless, it is possible to conclude that measures and validation strategies can improve such that a comprehensive benchmark of the studies can be identified, particularly with location and spatial analysis.