Design of a Support System for Complicated Logistics Location Integrating Big Data

Logistics location is an important component of logistics planning that affects traffic pressure and vehicle emissions. To date, there has not been an adequate study of the integration of big data into the location for a complicated logistics system. )is study developed a decision support system that can address location problems for complicated logistics systems, e.g., a multilevel urban underground logistics system (ULS), using logistics big data. First, information needed in the logistics location, such as the traffic performance index (TPI) and the origin/destination (OD) matrix, was collected and calculated using a big data platform, and this information was digitized and represented based on a geographic information system (GIS) tool. Second, a two-stage location model for a ULS was designed to balance the construction costs and traffic congestion.)e first stage is establishing a set-covering model to identify optimum locations for secondary hubs based on the ant colony optimization algorithm, and the second stage is clustering of the secondary hubs to determine locations for primary hubs using the iterative self-organizing data analysis technique algorithm (ISODATA). Finally, the Xianlin district of Nanjing, China, was chosen as a case study to validate the effectiveness of the proposed system. )e system can be used to facilitate logistics network planning and to promote the application of big data in logistics.


Introduction
With the rise of e-commerce and the growth of urbanization in various countries, urban logistics has become critical in ensuring the quality of people's lives and the sustainability of city development [1]. On the one hand, the demand for logistics services is growing rapidly, and the delivery efficiency requirement has become higher [2]. For example, the annual express delivery business growth rate in China exceeded 50% in 2016 [3]. On the other hand, the increasing freight volumes in limited urban areas exacerbate traffic congestion, which has an inevitable impact on energy consumption and environmental pollution. In the Guidelines for National Greenhouse Gas Inventory, it was indicated that petrol consumption in traffic jams is almost twice that during normal driving [4]. Moreover, as traffic congestion increases, CO 2 emissions [5] and PM2.5 [6] increase. ULS is a complicated and systematic task for which optimization algorithms have been designed considering various parameters, such as costs, distances, regional congestion, and freight volumes [9][10][11][12]. However, the current location approaches assume that important parameters, such as regional congestion and freight volumes, are known parameters. ese parameters, which are obtained using traditional statistical methods, may not reflect the real circumstances. As a result, the accuracy of the selected logistics location is undermined by the errors in the parameter values. It is challenging to accurately and promptly collect and process massive amounts of data that are related to these parameters and are produced in the process of daily transportation and logistics.
Big data has been implemented in logistics, and various applications have shown tremendous value [13]. It is characterized by a high data volume, a rapid data flow (velocity), and diverse data (variety) [14]. With the maturity of information technologies, the collection, storage, computation, and visualization of big data can basically meet the requirements [13], while its core values include integrating data, extracting the features of data, and recognizing patterns from the features to support decision-making. In logistics systems, data from a variety of sources, such as the Global Positioning System (GPS), the Internet of ings (IoT), or traffic conditions, are time-variant and large. In logistics, big data has been utilized to optimize crew and vehicle routing, predict consumer demands, and optimize warehousing layout networks [13,15]. However, studies regarding the utilization of logistics big data for complicated logistics locations are limited. Since logistics big data is needed to make decisions about complicated logistics systems, integrating it into a decision support system is challenging. e aim of this study is to fill the literature gap, i.e., the fact that there is no integrated decision support system for complicated logistics location problems in which big data and geographic information system (GIS) data are efficiently integrated.
e system is composed of logistics data collection, data processing architectures, data storing services, location optimization algorithms, and data visualization. e location optimization algorithms combine the setcovering problem (SCP) and clustering to solve the multistage hub location problem in a limited-capacity network based on the operational mechanism design of a hierarchical ULS network. e main contributions of this study are the following: (i) A big data platform is designed that integrates data from various sources, e.g., GPS, applications (APPs) in smartphones, GIS, and third-party data providers, to support decisions concerning ULS hub locations. (ii) A two-stage location optimization model is developed for a complicated ULS network that produces a more scientific layout. e model takes traffic congestion into consideration to estimate the actual demand of the ULS network. Moreover, the model is applied to a city in China in order to validate the effectiveness of the model.
is study integrates location selection into a big data platform for large e-commerce or logistics companies for actual logistics planning. It realizes the application of big data to the optimization of complicated logistics systems. Moreover, due to the ability of the ULS network to speed up delivery time and alleviate traffic congestion, the study provides a basis for the sustainable development of the urban transportation system.

Big Data in Logistics
Systems. In recent years, there have been many attempts made to improve logistics performances using the advancements in information technologies. Logistics ontologies were designed to systematically formalize domain knowledge of logistics [16], especially optimizationrelated knowledge [17]. IoT technology has been proposed to improve the productivity and efficiency of logistics management. For example, radio frequency identification (RFID) sensors and barcode sensors were attached to cargo to assist the operation flow of cargo inventory [18]. In addition, RFID technology and wireless sensors were combined to track and trace parts, semifinished goods, and finished goods to provide highly flexible information updating for order changes and picking problems [19]. Using the positioning capability of GPS and the tag identification capability of RFID, information about vehicle lines, logistics distribution, and the origin/destination can be directly managed to support space decisions in logistics [20]. Another useful tool in logistics is GIS, which captures, manages, analyses, and displays all forms of geographically referenced information. As spatial location information is useful in logistics, GIS is applied to assist with the positioning of facilities and vehicle route planning [20]. e application of these information technologies produces large-scale logistics data for various scenarios and operations. e significance of big data technology lies in mining the hidden values of the big data, rather than mastering huge amounts of information [13,21].
In transportation and logistics, big data has received more attention in recent years, and research on its practical implementation has been conducted. Cell phone location data and license plate recognition data were collected and fused to predict traffic flow using a zero-shot transfer learning model [22]. A highly skewed speed-density dataset was processed using reproducible sample generation and the least squares method to obtain accurate traffic flow fundamental diagrams for various traffic flow conditions [23]. Inspection records for port state control have been used to predict the number of deficiencies each inspector can identify for each ship [24] and the ship detention probability [25]. Zhong et al. [26] introduced RFID-Cuboids to represent the logistics information and mined the frequent trajectory from the cuboids. e frequent trajectory knowledge can then be used to determine the logistics plans and the layout of distribution facilities. Hopkins and Hawking [27] documented the integration of big data analytics and the IoT to improve driver safety, lower operating costs, and reduce the environmental impact of vehicles for a large logistics firm. A combination of truck telematics and geospatial information was used to monitor dangerous roads, driving conditions, and driver behaviour and to send alerts to drivers [27]. Furthermore, congestion maps developed from truck data records (GPS locations and dates) and road data records (usage of roads and dates) were used to optimize the schedules and routes of trucks in order to reduce congestion. Zhao et al. [28] analysed the management model, influencing factors, and development status of the B2C e-commerce logistics distribution, using big data, and optimized the resource allocation taking into consideration production sales and logistics distribution based on a big data platform. Shahparvari et al. [29] identified the optimal location for a logistics hub at a larger geographical scale using GIS. irty-four criteria related to the spatialstructural and functional attributes of the logistics hub location were input into the GIS, and the values associated with the performance of alternatives based on the criteria were automatically output [29]. In summary, different types of logistics big data models have been investigated for different scenarios and logistics processes. However, few studies have developed an integrated platform that includes logistics data collection, storage, optimization, and visualization. Similar to Shahparvari et al.'s study [29], this study focuses on the location selection of logistics hubs using GIS; nevertheless, ULS networks are more complicated than the logistics centres in Shahparvari et al.'s study [29].

Logistics Location for ULS Networks.
ULS is an effective approach for alleviating the negative effects of urban freight traffic, improving the efficiency and safety of urban logistics, and saving the ground space [30]. In recent years, Switzerland, Italy [8], and China [31] have actively prepared for the construction of underground logistics networks. Given the uniqueness of the transport mechanism and because the service capacity of hubs and channels is restricted by the underground space, the traditional location-allocation method used for ground logistics networks is not suitable for ULS networks. In the following, we briefly review the model construction and method selection in order to define the boundary of the urban underground logistics hub location problem (ULHLP). As a multilayer network, the ULHLP has been exploited via referable methods by various researchers. e optimal ULS nodes, tunnel layout, and transport route network flows were formulated via a biobjective mixedinteger linear programming model considering minimal costs and maximal system utilization [9]. Liang et al. [10] established a multiobject ULS network planning model, including hub location and tunnel linking, and used agglomerative hierarchical clustering to determine the location of the first-level hubs and a greedy algorithm (GA) to determine the locations of the second-level hubs covered by each first-level hub. Ren et al. [11] constructed a set-covering problem (SCP) model to determine the locations of the firstlevel hubs, and their results were optimized according to freight volume and the cargo handling capacity of the hubs. Tunnel length, regional congestion, and weighted distance were considered to determine whether to set second-level hubs [11]. He et al. [12] used the SCP to determine the number of underground logistics centres under a redefined underground logistics structure; then, a 0-1 planning model was used to connect the logistics centre with a distribution service centre. An improved bat algorithm was used to solve the layout model of the centres [12]. In previous studies, most two-layer hub-locating studies have been conducted from top to bottom, with calculations based on road freight data or standardized data. is means the calculated network carrying capacity may be much larger than the actual demand. e solution to congestion has been to add hubs or establish tunnel connections in the congestion area. While this can effectively solve local congestion, it lacks overall optimization for the entire area. e SCP and clustering approaches are often used to determine the location of facilities to solve the problems of facility locations and customer allocation [32]. e SCP model can effectively deal with the strategy of selecting the minimum and optimal locations from candidate nodes and can provide service for the maximum demand in the region [33,34]. e exact method of the SCP is inefficient due to the significant time needed for large problems [35]. As a result, heuristic algorithms, such as the GA [36,37] and the ant colony optimization (ACO) [38][39][40], were proposed to solve the SCP. Compared to the GA, which is prone to being trapped in local optimization in the SCP, the ACO can obtain better results [41]. Clustering approaches have been widely used in the division of spatial regions, and they are useful for classifying demand points according to certain characteristics. O'Kelly [42] used clustering to minimize the sum of the squared deviation from the clustering mean based on a set of interacting spatial points. A probabilistic clustering method was proposed to solve the multifacility location problem, and the probability at each iteration depends on the travel costs based on hub locations [43]. e iterative self-organizing data analysis technique algorithm (ISODATA) can effectively eliminate transmission errors through the dynamic correction of the cluster centre [44,45]. is gives it an advantage over k-means clustering without a given cluster quantity, which meets the requirement of basic ULS hub division.

Materials and Methods
e transportation process of a ULS-embedded urban logistics system can be organized into a three-layer network ( Figure 1). In the first layer, goods are transported via underground tunnels (pipeline) from logistics parks (LPs) outside of the city to the hubs in the ULS network. Such hubs are defined as primary hubs (PHs). In the second layer, depending on the flow direction, the goods are transferred from the PHs to hubs serving the logistics demand point (DP). ese hubs, which serve the DP, are defined as secondary hubs (SHs). e SHs are not connected to the LPs. In Advances in Civil Engineering the third layer, the goods are transported from the SHs to the ground and finally to the customer via ground transportation (e.g., by hand or minivan), in such a way that the ground traffic load does not increase. e reverse process is also feasible, that is, goods flow from the SHs to the LPs.

Data Preparation.
Since the ULS networks cooperate with traditional freight transportation systems to deliver goods from their origins to their destinations with the aim of alleviating traffic congestion problems, the optimization of the hub locations begins with determining two factors, i.e., the daily traffic congestion and the one-day freight volume with the origin/destination (OD). is step consists of data collection and analysis, and the values of the two factors are input into the ULS hub location model as influencing constraint conditions. Traffic congestion is measured using the traffic performance index (TPI) in China [46]. e required data to calculate the TPI include the set of roads in a certain region A; the road grade j; the length of a road l; the average speed on a road v; and the traffic flow on a road x. e details of the calculation method can be found in the literature [47]. e TPI is calculated every 15 min during the morning peak (from 7:00 to 9:00) and the evening peak (from 17:00 to 19:00) on workdays, and thus, the values of v and x of each road in A must be updated every 15 min. e remaining variables, i.e., A, j, and l, can be extracted from the Amap API (https://lbs.amap.com/) that is used as GIS software. Amap also provides v in real time, while x can be collected from DiDi (https://github.com/didi). e freight OD matrix in which each element represents the quantity of the goods delivered from an origin region (row) to a destination region (column) is estimated using data collected by a logistics tracking system. e logistics tracking system is developed to collect and manage the information of the present location of each delivery item and is supported by a combination of technologies, including the quick response (QR) code, GPS, scanning equipment, and GIS. e QR code is attached to every item and stores the information about the item, such as the item id, origin address, destination address, and weight. When goods are loaded for transportation, a scanner is used to scan the QR code and to connect the good id to the vehicle id. Given that each vehicle is equipped with a GPS receiver and the vehicle id is connected to the good id, real-time location data for goods can be recorded and stored. Moreover, GPS coordinates are converted into Amap coordinates using the Amap API, and then, regional information can be inferred from the Amap coordinates. e OD information stored in the QR codes and inferred from the GPS is fused to ensure the accuracy. e freight OD matrix is obtained by grouping the delivered goods according to their origin and destination regions and by summing the weights of the goods in each group.

Big Data Platform Architecture.
is section proposes a big data architecture and an analytic framework to support ULS hub location selection. Because large e-commerce enterprises (e.g., Amazon.com and JD.com) and logistics companies (e.g., FedEx and SF Holding) are prone to plan, construct, and maintain ULS networks due to the high cost and the complexity of ULS networks, the support system for ULS hub location selection based on big data and the existing logistics information systems of large enterprises can be integrated into a big data platform associated with big data technologies in order to support flexible logistics processes. Figure 2 presents the architecture framework. e overall architecture is organized into six parts: the data source, data pipeline, data processing, data storage, data application, and data service layers. e data source acknowledges the data produced using different techniques, such as the QR code, GPS, text input to APPs in smartphones, sensors, and data from third parties, during logistics processes. Kafka is a distributed eventstreaming platform for high-performance data pipelines, and it has been adopted in logistics big data platforms [48,49]. Kafka distinguishes between producers and consumers to improve scalability. e data sources in the data generation layer are producers that publish events that are organized and stored by topics on Kafka. ese topics include average speed, location, and traffic flow. Spark and Hadoop receive the records on the topics in Kafka separately for front-end applications, e.g., ULS hub location selection, logistics tracking and tracing, and logistics route optimization. Optimization algorithms (OAs) and artificial intelligence (AI) technologies for these applications are implemented in Spark or Hadoop depending on the real-time requirement. Hadoop processes large amounts of static data collected over time [48]. e two core components of Hadoop are the HDFS (Hadoop distributed file system), which is responsible for storing data, and the MapReduce paradigm, which distributes the data across many servers. In contrast, Spark gathers and processes the data dynamically as they appear [48].
In the case of ULS hub location selection, Hadoop was selected based on historical data, and the algorithms implemented in Hadoop for the ULS hub location model are presented in Section 3.3. However, regarding route optimization for driving vehicles, Spark is more appropriate since the most optimal route should be calculated as quickly as possible. Finally, the results of the calculations in Spark or Hadoop are stored and sent to corresponding applications. Moreover, a GIS provides geographic information, such as road length, roads in a region, and the GPS in a region, to the ULS hub location model and visualizes the results of the hub locations.

Two-Stage ULS Hub Location Model.
To simplify the model, the following assumptions were made: (1) Since the freight demand per region is too small in some regions, it is deemed to cover the entire demand region if the service range (SR) of the hub covers the DP coordinate.  Advances in Civil Engineering hub in regard to the delivery or transport of goods, including sorting, handling, and other operations. (6) e PH has an unlimited CHC.
A two-stage hybrid set-covering and clustering model was proposed for ULS hub cluster classification and location. First, based on the regional ground freight OD data and the TPI discussed in Section 3.1, the possible underground freight OD data undertaken by the ULS network were calculated and generated. Second, a set-covering model was established to obtain the minimum SH number that can cover all of the areas under a given SR. ird, the number and location of PHs were determined by clustering the freight volume of the SHs and the distance from the PH to the SHs it covers.

OD Matrix for ULS.
Based on the readily available freight OD matrix, this study proposes a calculation method using the regional TPI to generate an OD matrix for underground freight, which is the basis for the ULS network calculation. By weighting the TPI and the underground construction cost, the TPI values of all the demand regions can be reduced to η 0 � 4, which indicates that the traffic congestion state is clear. In addition, it is approximately considered that the TPI has a linear relationship with the regional freight volume [30]. In this way, according to the reduction rate of the TPI, a regional freight volume of the same proportion can be transferred underground. us, the freight volume to be transferred underground in region i is where Q i is the freight volume of region i; OD(i, k) is the element in the ith row and the kth column of the OD matrix; and η i is the TPI of region i. Considering the adverse effect of the large number of decision variables [50] and the influence of OD(i, k) on the TPI of origin region i and the TPI of destination region k, the OD matrix of the underground freight can be simplified to where OD u (i, k) is the freight volume transported from region i to region k that needs to be transferred through the ULS. When the TPI of origin region i or destination region k is larger than η 0 , the larger TPI of i and k is used to calculate the value of OD u (i, k); otherwise, OD u (i, k) � 0 is used.

SH Location Based on the ACO Algorithm.
According to the assumptions, all of the SHs have a maximum service radius, which is the maximum coverage of one hub as a subset. Since the distance from the DP to the SH does not exceed the service radius of the SH and hub construction costs far exceed the transportation costs, the SH construction cost is taken as the total cost. Since the total construction cost of hubs is proportional to the quantity, the objective function can be established to minimize the total cost. e SCP model is as follows: subject to j∈M a ij x j ≥ 1, ∀i ∈ N, x j ∈ 0, 1 { }, ∀j ∈ M, In the model, c j is the construction cost of SH j, which can be set to 1 since the construction of the SH costs the same. If DP i is covered by SH j, then a ij � 1; otherwise, a ij � 0. N is the set of DPs in the district, N � 1, 2, . . . , n { }. M is the set of candidate nodes for the SH, M � 1, 2, . . . , m { }. d i is the freight demand volume of DP i, and D j is the CHC of SH j. V ii′ is the freight volume transported from DP i to DP i ′ . A(j) is the set of DP i covered by SH j. B(i) is the set of SH j that can cover DP i. e objective function (equation (3)) describes the SCP model with the goal of achieving the lowest total cost for the hubs. e constraint described by equation (4) ensures that each secondary hub covers at least one DP. Equation (5) means that each DP can only be serviced by one SH. e 6 Advances in Civil Engineering constraint described by equation (6) indicates that the total freight volume of the DPs covered by the SH cannot exceed its capacity. e model is an SCP with nonlinear constraints rather than a simple SCP. It is used to cover all of the demand regions with the least number of SHs, which reduces the large construction investment and the difficulties in the early stage of ULS construction. e principle of the ACO is to mimic the foraging behaviour of ant colonies based on information exchange. e principle of the ACO is to mimic each candidate node regarded as a node on the path. e ACO in this study is described as follows: (i) Initialization: m ants are randomly placed on n candidate nodes, that is, each ant randomly selects a candidate node as the starting node of the path. e initial pheromone of each candidate node is set to be the same. (ii) Probabilistic choice: using a random probability selection mechanism, ant k selects a subset S i in S with probability P. P is calculated using equation (10), where η i (t) � |C i |. |C i | is the number of nodes in subset S i .
(iii) Update the pheromone: after each iteration, the pheromone of the path through which the ant passed is increased. At t + 1, the pheromone adjustment rule for each subset is where Δτ j (t, t + 1) is the number of pheromones left in solution set i by ant k at t + 1, which is calculated using the following equation: (iv) Termination condition for the ACO: the SCP has difficulty obtaining the optimal solution in practical applications, and the given iteration number is usually taken as the termination condition for the algorithm. For example, the algorithm terminates when the quality of the solution is no longer improved after N consecutive iterations.

PH Location
Based on ISODATA Clustering. SHs can be grouped based on certain rules to locate the PH. A combination of the clustering method and the improved ISODATA was used to solve the PH localization problem. e regional centre formed by the clustering and grouping of the SHs is the PH's position.
e positioning of the PH should minimize the overall transportation costs of the ULS network, which is related to the freight volume and the transportation distance. e PH should be set in an SH area with a large and concentrated freight volume. erefore, the clustering model considers the freight demand of the SH and the distance between the SH and the PH. e distance between hubs is the Euclidean distance [39]. e objective function of the clustering model is where w j is the freight demand of SH j and s jk represents the distance between SH j and PH k. I is the set of SHs in the region, and J is the set of PHs. e ISODATA has advantages in terms of automatic calculation and the acquisition of a reasonable number of clusters [40]. e distance in the ISODATA is calculated by weighting h jk � w j d jk , where h ij is the clustering distance from SH j to the cluster centre k; w i is the freight volume of SH j; and d jk is the Euclidean distance between j and k.
Step 1. Identify and determine some of the initial values that can be artificially modified during the iteration. N samples x c , c � 1, 2, . . . , N are allocated to the clusters according to the initial values.
If the number of samples in S λ is less than θ N , S λ should be deleted, and N c � N c − 1.
Step 2. Calculate the distance index function of all of the samples.
(1) Modify the cluster centre: Advances in Civil Engineering 7 (2) Calculate the average distance between each sample and the cluster centre in each cluster domain S λ : Step 3. Perform the split operation.
(1) Calculate the standard deviation vector of the sample distance of each cluster: Step 4. Perform the merge operation.
(1) Calculate the distance between each cluster centre: (2) Compare D cλ with θ c , and sort the values of D cλ (D cλ < θ c ) from small to large; that is, e distance between them is D ikjk . e new centre is calculated as follows: Step 5. Iterate again and judge whether the clustering results meet the requirements. After several iterations, if the result converges, the operation exits, and the result is preserved.

Results and Discussion
e proposed methodology was validated through a case study of the selection of ULS hub locations in the Xianlin district of Nanjing, China. Figure 3 shows the freight transportation map of the Xianlin district in Amap, which can be used as a GIS tool. e black squares in the figure not only represent the geographical centre of the 110 regions officially included in the urban planning but also correspond to freight DPs in these regions. e number represents the code of the region and its DP. Four LPs are located in different directions outside of the district. e TPI of each region is displayed in Figure 4. Using different colours corresponding to different TPI levels, Figure 4 shows that a majority of the regions are congested. A 114 × 114 OD matrix represents the volumes of the goods transported between the 114 regions, including 110 DPs and 4 LPs. e sum over each column corresponds to the goods leaving in the region and is illustrated in Figure 5(a), and the sum over each row corresponds to the goods entering the region and is illustrated in Figure 5(b).
Based on equation (2), when η 0 � 4, the total underground freight volume is 67,112 tons, accounting for 41% of the original freight volume. e CHC and SR of the underground logistics hub were temporarily set as 3000 tons and 3 km, respectively, to calculate the location-allocation. In Section 4.3, the calculation results for different combinations of these two parameters are further compared and discussed.

Optimization Results for SH Locations.
When SR � 3 km and CHC � 3000 t, 40 SHs were screened out by the ACO (Figure 6, in which red circles represent SHs selected from DPs). Figure 6 provides an example of 791 DPs that were selected as the SHs, in which the circle indicates the service coverage cantered on the SH and the covered regions are represented by the blue squares inside the circle. e actual freight volume (AFV) of an SH is defined as the sum of the freight turnover volume of the DPs covered by the SH. Moreover, another important index, i.e., the average saturation of the cargo handling capacity (ASCHC), was used to measure the actual freight handling performance of the SH, which can be calculated using the following equation: ASCHC � n (actual freight turnover/cargo handling capacity) n × 100%, ASCHC ∈ (0, 1), where n is the number of SHs. e larger the ASCHC, the higher the hub utilization and the better the ULS network performance. In contrast, a low ASCHC indicates that the designed capacity of the hub is far larger than the actual demand, which will cause huge waste. Table 1 shows the SH sets and their coverage regions calculated by the ACO; the summary is shown in Table 2. e average single runtime for the ACO was 414.5400 s. 8 Advances in Civil Engineering ere are three reasons for the quantitative difference in the hubs serving only one DP in the results. First, the distance from the other nearest DPs to the hub is more than its service range. ere is only one such hub in this case (number 896). Second, the capacity of a hub can only meet the demand of the region where it is located. For instance, DP 886 has a freight volume of 3088.07 t (over 3000 t) in the ACO result, and thus, it requires the help of the SHs nearby to meet this volume, including SHs 885, 890, and 891, which are dedicated to it. ird, following the logic and search order of the chosen algorithm, some of the hubs were generated through local optimization. ree SHs (nos. 808, 825, and 892) in the ACO result remained after the points nearby were searched and covered. e superiority of the ACO's pheromones enables individuals to communicate indirectly through the environment and use the probabilistic search method to effectively avoid local optimal solutions. is study aimed to propose a two-stage method suitable for the selection of hub locations in a ULS. us, the next stage was calculated based on the set of results obtained using the ACO.

Optimization Results for PH Location.
e PH locations were obtained by reclustering the optimized SH group and by considering the freight volume and distance. e goal of clustering is to minimize the total distance between the PHs obtained and the subordinate SHs and to maximize the freight volume. Given that the PH needs to be linked with the LPs outside of the district, in order to reduce the difficulty of cargo scheduling and improve the clustering effect, the expectant number of clustering centres K was set to 4. In this way, the PH can be regarded as a controlling intermediate node connecting the logistics park and the SHs.

Advances in Civil Engineering
From the perspective of management convenience, a one-toone correspondence between parks and PHs is reasonable. Table 3 lists the four PHs and their subordinate SHs, which were calculated using the ISODATA. eir distributions are marked separately using different colours in Figure 7. e locations of the four generated PHs do not coincide with the existing DPs. However, it turns out that the positions of some of the SHs are very close to the PHs. In actual planning or further calculations, these SHs could be replaced by PHs. For example, the distance between SH 800 and PH A is only 0.087 km, so it is economical to let PH A assume the function of SH 800. When the PH location obtained through the clustering is not suitable under actual conditions, it is acceptable to adjust its position within the clustering area.

Parameter
Effects. CHC and SR are worth discussing in terms of the modelling and computational analysis. e hub is usually equipped with an automated sorting and transmission system, and some of the goods that are transported from the SH to the customers still need to be delivered on the ground. erefore, in addition to the constraint imposed by the underground space, these two parameters are largely subject to the automated logistics technology and the carrier distribution capacity. Given its similarity to the current express delivery process, the common data for general express delivery (Table 4) are used for the discussion. Table 5 summarizes the calculation results for the parameter combinations. Several valuable observations are as follows:       (1) An obvious conclusion is that the amount of SH decreases significantly with increasing CHC, and the relationship is almost linear. However, the amount of SH is not inversely proportional to the SR of the hub. CHC is more sensitive to the number of SHs than SR. erefore, effort should be made to improve the freight capacity of the hubs to reduce the number of SHs and thus to reduce construction costs.
(2) e number of hubs serving only one DP varies considerably under the same CHC but a different SR, especially when the SH has sufficient capacity to handle the AFVs of all of the DPs. For instance, when CHC � 4000 t and SR � 3 km, there are six such hubs, which is three more than when SR � 4 km. Similarly, when CHC � 5000 t and SR � 3 km, there are five such hubs, which is three to four more than under other SR conditions. e lower the number of these hubs, the more efficient the SH group. erefore, the CHC and SR of a hub should be matched; otherwise, they will seriously affect the network construction costs and the operational efficiency. (3) It is necessary to reserve some of the spare capacity of the node. Global search results always make full use of the CHC under computational logic, aiming at the minimum number of nodes. In this case, the AFVs of all of the hubs account for over 80% of the designed capacity on average, and the utilization rate of a few hubs even exceeds 90%. Although increasing facility utilization is a common goal, future risks of rapid growth in urban freight demand cannot be ignored when operating at near-full capacity. A surge in freight volume that exceeds the capacity limits can lead to the collapse of the hub operations, as can equipment failures. Only a small number of nonadjacent hubs collapse. e other hubs can still maintain the network operation when the capacity is exceeded, but the collapse of multiple or adjacent hubs will lead to the inefficient operation of the entire ULS network, or even paralysis of the network.
(4) In addition, the standard deviations (SDs) of the PH AFVs presented in Table 5 reflect the balance degree of the freight volume of each PH. As can be seen, the SD of the AFV varies irregularly. When CHC � 4000 t and SR � 3 km, the SD reaches the maximum value, which is 15,726 t. An imbalance of the AFV would limit the overall operating efficiency of the network. e operating efficiency of the PH with the largest AFV would become the limiting component, which could negatively affect the operation of the entire network. us, it is necessary to choose a hub group set with a balanced AFV distribution. Furthermore, the PH capacity is not set as a fixed value in this study. Rather, each PH should be independently designed according to the practical engineering requirements, and the balance of the AFV distribution of the PH should be maintained as much as possible.

Conclusions
In this paper, an integrated support system for complicated logistics location is proposed, i.e., ULS hub location determination, based on big data technologies. First, to calculate the TPI and OD matrix, the types of data generated across a typical data-driven logistics system and methods of collecting these data were investigated. Second, a platform for big data analytics in logistics systems, including hub location selection, was developed. e data for the TPI and OD matrix were produced, collected, stored, and analysed using different modules in the platform. ird, a bottom-up two-stage hub location method and a visualization tool based on the GIS were incorporated into the platform. In this method, the underground cargo flow data used for the calculation were obtained by proportionally processing the OD matrix according to the TPIs of different demand regions. An SCP model under nonlinear constraints was used for the SH selection, which was solved using the ACO algorithm. Compared with the genetic algorithm (GA), the ACO algorithm gives better results, but it takes more time.
e PH locations were determined using the ISODATA to cluster the SHs, and the ISODATA allows for different numbers of clusters.
In addition, this study highlighted the significance of developing a big data analysis platform for large e-commerce enterprises and logistics companies to improve the efficiency of logistics management. Complicated logistics location is one of the important applications supported by the platform. Traditional logistics planning focuses on designing optimization algorithms to obtain improved solutions or choosing evaluation criteria for decision-making, but it neglects some basic variables, such as the TSCPPI and OD matrix, the values of which are assumed to be known. e complete decision-supporting process for ULS hub locations, including data producing, collecting, storing, processing, and visualization, was integrated into the platform, which enables the smooth flow of information between different stages of the process. Moreover, considering the growing number of applications in logistics management, the platform was built using the Kafka system, which has the ability to plug in different data sources and applications. In this way, the platform can address the interoperability across the logistics data, the logistics process, and the logistics services. Different users of the platform, such as delivery vehicles, managers, and planners, can cooperate with each other.
is study provides an initial step in the investigation of the use of a big data platform for complicated logistics location determination. In future studies, the performance analysis of the big data platform for logistics management will be considered since determining the main factors that affect the quality of the platform is important for optimizing the design of the platform. More efforts are also needed to optimize the matching of the CHC and SR in the two-stage ULS hub location method before the hub group calculation. Other factors influencing location selection should also be investigated, such as the distance from a PH to its corresponding LP, the actual cost, and urban land use planning.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.