Bringing economies of integration into the costing of groupage freight

The purpose of this paper is to develop a novel calculation scheme for the costs of distribution per shipment according to a cost-by-cause principle. We propose to estimate the full costs of distribution routes excluding and including a new consignor. Then, we estimate the marginal costs per shipment and per consignor. The contributions of this paper are (1) a comprehensive list of drivers of Economies of Integration and (2) a calculation scheme, how to estimate true marginal cost of new consignors. Practitioners may deploy the method and insights of this paper for tariff design, negotiations, consignor acquisition, and also demarketing.


Introduction to pricing in groupage freight
All freight forwarders face the recurring same problem of integrating new consignors into their distribution. A prospective new consignor who plans to outsource distribution of shipments always negotiates about discounts off the standard tariff (Baker 1991;Özkaya et al. 2010). The standard tariff is either build on historical and regulatory tariffs, the forwarders cost structure, or a modified version of the competitors' tariffs. The pivotal argument of consignors is that more volume (ton-kilometres) results in better economies of scale as the large fixed costs decrease on a per shipment basis. The distribution tour is viewed as a service production process, in which joint deliveries of many consignors are produced and thus the costs of that process are allocated to ever more shipments. Nevertheless, this is only one side of the coin. Every new consignor adds new shipments onto an incumbent shipment structure, which is distributed within an incumbent distribution network using incumbent vehicles, subcontractors, and tariffs. As a result, on the one hand, new consignors might complement the incumbent shipment structure smoothly, but on the other hand, they might disrupt optimized routes, increase the number of tours, and add far-off stops to the distribution. Therefore, whenever a freight forwarder acquires a new consignor, he must evaluate the fit of the new consignor's shipments and the incumbent distribution network's shipment structure to calculate a tariff, which covers the new (combined) shipment structure's costs. The difference between the calculated tariff and the standard tariff is the negotiating range. From collaborations with practitioners, we learn that negotiation ranges are based historic rates, competition, and gut instinct. This may lead to unprofitable long-term deals because once contracted, the newly acquired shipments change the forwarder's distribution costs and thus the profitability.

The groupage freight forwarding process
The transportation network of freight forwarders is usually a three-echelon system. The process and the structure of that system is visualized in Fig. 1.

Collection
The first echelon is the collection. Typically, the number of shipments per ship-from (elsewhere pickup) location is already dense. Corporate shippers ship a large number of shipments to many destinations. Therefore, collection tours collect many shipments from a few consignors, which are ultimately addressed to many different recipients. As a result, groupage freight networks are essentially few-tomany transportation networks.

Line-Haul
The second echelon is line-hauling. At this stage, full truck loads (FTL) of bulk shipments are transported to a central hub or directly to a receiving terminal. The function of the central hub is sorting and consolidation of bulks of shipments from different origins towards the same receiving terminal.

Distribution
The third echelon is the distribution. The receiving terminal acts as a break-bulk-terminal (Daganzo 1987), where the FTL bulks of shipments are broken up and sorted into daily distribution tours. Typically, the density of shipments per destination (stop factor) is slightly greater than 1. As a result, the distribution accounts for the major share, often more than 50%, of the total cost per shipment (Boone and Quisbrock 2010;Lohre and Monning 2007).

Problem definition
In this paper, we investigate the problem of a groupage freight forwarder (GFF) who wants to evaluate a new consignor. This evaluation problem is herein called the "New consignor integration problem" (NCIP). The NCIP occurs in the sales department of any logistics service provider (LSP), such as GFFs, daily. Despite its practical relevance, the NCIP is not well-researched, yet.

The situation of a new consignor is the following
In the past, shipments have been shipped via in-house fleet or by some LSP. For some reason, the consignor decided to move forward and now plans to outsource distribution or to replace the outsourcing partner. Therefore, the consignor, who outsources distribution, requests a proposal for a longterm shipping contract. In order to obtain a great discount on the tariff as a result of negotiation, the consignor provides a history of past shipments to the prospective GFF (Harrington 1997).

The situation of the GFF is the following
There are many incumbent consignors feeding shipments into the GFF's distribution network. The incumbent shipment structure has some distinct, well-known properties, and the operational tours are optimized for this shipment structure. As a result, distribution costs in the past have had a certain level. Tariffs are calculated based on these costs (Ying and Keeler 1991). Now, as the prospective new consignor wants to feed more shipments into the incumbent network, the shipment structure changes and so does the cost structure. The GFF wants to analyse the provided shipment history with respect to the change in cost per shipment. Good consignors decrease the cost per shipment, neutral consignors do not change the cost per shipment, and bad consignors increase the cost per shipment compared to the previous shipment structure.

Research question
This paper addresses the NCIP and investigates the following research question: How should a freight forwarder calculate the impact of a new consignor's shipments on the costs per shipment?

Use cases for measuring a new consignor's impact on distribution
There are plenty of use cases for measuring a new consignor's impact on the distribution cost per shipment.
1. Most important, negotiations with prospective consignors benefit from better reasoning. First, reasonable hard facts in line with the cost structure replace arbitrary volume-based discounts. Second, the freight forwarder is protected against self-deceiving sales practices that try to gain revenue instead of profit. 2. A second use case is the transfer pricing within horizontal freight forwarding alliances. The forwarder who acquires a new customer is involved in the negotiations but is not responsible for the distribution of the new shipments. As a result, the acquiring forwarder grants a discount on the basis of fixed transfer prices (The receiving forwarder invoices the transfer price of a shipment to the acquiring forwarder). However, this might be a bad idea, as the transfer pricing may be outdated, ignore major drivers of distribution costs (Lohre and Monning 2007), or assume symmetrical and homogeneous shipment structure and volume. As a result, the acquiring forwarder and the receiving forwarder should exchange information and cooperate in the acquisition and integration of a new consignor with respect to its impact on the distribution costs. 3. A third use case is subcontracting. Freight forwarders outsource freight in order to fulfil their transportation requirements. Subcontractors are often self-employed carriers driving on behalf of the freight forwarder on a daily basis. Subcontracting tours, standard tours, or distribution districts is a common outsourcing practice in CEP (courier, express, parcel) and groupage industries. Forwarders and their subcontractors are interested in productivity, especially the over-or underutilization of tours and districts. Measuring a new consignor's impact on a tour or district is helpful for evaluating a subcontractor's expected workload and the fairness of wages among different subcontractors. Furthermore, such a measure is an early warning of overutilization and all its consequences such as delays, penalties, overtime hours, and idle time of additional capacities.

Outline
The next section reviews the body of literature on cost and revenue accounting in GFF starting from the period of deregulation in the early 90's. There is a research gap identified with respect to an identification procedure for the new consignor's impact on costs per shipment. "Economies of integration" defines, explains, and dismantles the pivotal term "economies of integration" from the perspective of a GFF. Based on the insights from sections "Literature review" and "Economies of integration", "Methodology: a data-driven approach" lays out a procedure on how to calculate the marginal cost per shipment of a new consignor as a function of the shipment structure. In "Computational analysis", we demonstrate a computational case study using real-world data from a disguised GFF. The final section concludes this paper by discussing both theoretical and practical implications and outlining further research.

Literature review
The transportation industry has been subject to very strict regulations in Germany until 1994 (Seiler 2012) and in the U.S. until 1980 (Mentzer 1986). Among others, these regulations were about the governmental prescription of freight transportation tariffs. With the abolition of the so-called Güterfernverkehrstarif in Germany and the Motor Carrier Act in the U.S., previously existing regulations have been removed completely. Because of this deregulation, freight forwarders are permitted to deploy their own tariffs. This new process of pricing transportation services often uses the freight-specific attributes distance and tonnage to determine the price of an individual shipment (Smith et al. 2007). In this context, distance is measured as the length of a oneway trip from pickup location to delivery location. Tonnage describes the overall payload of a specific shipment. As a result, the LSP calculates the so-called base rates, which are mainly cost-driven (Ying and Keeler 1991). The reason is obvious: in order to make profits, the price per shipment has to include the marginal costs per shipment and a reasonable surcharge of overhead costs such as back-office processes, IT, and real estate. Consequently, cost-based pricing of shipments requires two pivotal cost information: the cost structure of distribution operations itself and the overhead cost. In literature, many contributions towards the topic of cost calculation in freight forwarding can be found. Bø and Hammervoll (2010) calculate transportation costs with an extensive Microsoft Excel Tool. They categorize costs by dividing them into fixed costs and variable costs. Fixed costs are independent with regard to time and distance travelled. Variable costs are either depending on the round tours' distance or travel time. An interesting aspect of their categorization is the handling of wages. They assume that the driver gets paid per minute. As a result, a driver who works for only 6 h due to underutilization is only paid for 6 h. In reality, the drivers are getting paid fixed wages per tour or day. Boone and Quisbrock (2010) categorize all costs as variable costs considering different cost drivers except the costs for the company/network containing central management, scheduling, and maintenance of IT-systems. An allocation algorithm is used to allocate those fixed costs among all shipments. The authors' main contribution is the conclusion that costs increase progressively as a function of the drop distance.
A very similar approach is used by Bokor (2011). He uses performance costs and performance units (e.g. the number of shipments, driven kilometres) to calculate the costs of one performance unit. The resulting rates per unit can be used to calculate the different costs of a transportation service. Sun et al. (2015) present a novel approach on how to estimate the long-term costs of a new delivery destination. The authors used multiple geographic factors of an incumbent network structure as input data for a neural network, which estimates the costs of every possible delivery location. The difference between their previous analysis and our current method lies in the selection of input features used. We assume to know the exact shipment structure of a potential consignor as proposed by Harrington (1997). Sun et al. (2015) do not deploy data on the new consignor and his new delivery destinations. Another difference is the incorporation of payload as a driver of cost. We propose to account for payload because heavier shipments consume more capacity and more loading time.
In this paper, the NCIP is solved using a modular methodology. The cost calculation scheme is adapted from Wittenbrink (2014). This calculation scheme accounts for four main cost types. First, the variable cost of kilometres driven is calculated. The second type is personnel cost. The third type is time-dependent fixed costs of asset usage. Finally, the last type is fixed overhead costs that cannot be allocated directly to a specific performance unit. Wittenbrink (2014) uses the same calculation process as Bokor (2011). He calculates the sum of every cost type (e.g. purchase price of truck, tyres, maintenance, and fuel). The next step is to estimate the average used performance units in the considered period. Thereafter, the costs per performance unit are calculated. The same principle of cost accounting is also found in Kaplan and Anderson (2003) where the authors describe the usage of unit costs that correspond to costs per performance unit. The so-called "time-driven activity-based costing" is exclusively described with the usage of time units as performance units. In our work, we are not only using time units as a calculation base. We adapt the principle to "unit-driven activity-based costing". Our performance units are driven kilometres and working time. The overhead costs that cannot be allocated by distance or time are allocated by the number of shipments. In Table 1 the main cost types, performance unit dependencies, and several exemplary components are illustrated.

Economies of integration
Economies of integration (EOI) are introduced by Keeler (1989). He defines EOI as follows: "[…] economies of integration, […] relate to all forms a large trucking firm can be more efficient than a small one.
[…] economies of integration include more than scale economies in the strictest sense. Economies of large-route networks can be thought of as economies of density combined with economies of vertical integration". Fleischmann (1993) calls a similar phenomenon "transport economies of scale". This leads to the supposition that freight forwarders do not only gain competitive advantage in the form of cost reductions by distributing more overall volume. Therefore, EOI are of major importance when a GFF evaluates the effect of new consignors. We consider the following characteristics to determine the extent of EOI.

Overall shipment structure
Shipments are the revenue and cost objects in accounting of freight forwarding companies. Lin et al. (2009) show an example of a consignor's shipment structure being the weight-demand per day. In that example, the average weight per shipment is 120 kg, which is a typical example of groupage freight. However, the majority of shipments have below-average weight and only few outliers are heavier, weighing up to 450 kg. Volume of shipments Giordano (2008) states that there is persistence of continuous economies of scale in the transportation industry. He refers to the total ton-miles per firm as a measurement for volume. An increase in ton-miles can be achieved by acquiring more shipments, heavier shipments, or shipments with greater length of haul.

Average payload of shipment
McMullen and Tanaka (1995) find that increasing average load and size per shipment is associated with significant economies of scale. Higher average payload of shipments increases the probability of better-utilized trucks. The costs per truck, driver, and driven kilometre can be split among a greater number of cost objects, thus decrease the costs per object. On routes with lower densification, this effect has an even higher impact.

Drop factor: the average shipments per stop
The drop factor is an indicator of stop productivity and is defined as the average number of shipments that are delivered to the same destination. As Shah and Ward (2007) stated, an important part of lean production is the reduction of production downtime between product changeovers. In transportation, the production process is moving shipments from one location to another. Following that, the time when no shipment is moved is considered as production downtime. Whenever the driver stops at a delivery location, he loses some time for parking and taxi. In the case of delivering more than one shipment to the same destination, this stopping time can be split up between these shipments. The production downtime per shipment decreases and so do the costs.

Densification: the average distance between stops
Decreasing costs per shipment also result from better tour densification (McMullen and Tanaka 1995). Higher density directly leads to more shipments being distributed due to less driving time between stops. Keeler (1989) calls this "economies of density" due to "more traffic on one route". Less driving time stems from less average distance between subsequent stops. Densification means to decrease the average stop-to-stop distance of any tour. Another indicator of densification is area density: it is defined as the average number of stops per area unit. Area density can be improved through the acquisition of more shipments into the same area or district. The two indicators area density and tour density correlate positively.

Approach distance: the average distance from terminal to stops
The length of one tour is limited by the truck's capacity and the driver's maximum allowed working time per day. 1 We assume that every tour has fixed costs because of loading and scheduling before the start. The goal should be to utilize drivers to full capacity with one tour per day to save as many fixed costs as possible. Accordingly, the tours should be planned to reach the time restriction. Meeting the time restriction gets more probable with a rising average length of haul. McMullen and Tanaka (1995) also find that a greater average of haul is associated with lower costs per output (ton-miles). A significant increase in the average of haul can be achieved with larger approach distances. The approach distance can be seen as an overhead of the tour length. It is the sum of the distance between the terminal and the first stop as well as the detour from the last stop to the terminal.

Methodology: a data-driven approach
We propose to apply a data-driven modular approach in order to compare the costs per shipment. Shipments are characterized by their distance and weight. The most important advantage of this characterisation is the practical applicability in tariff building and sales negotiations. Our approach to quantify the impact of a single new consignor is the following function: input data are both the GFF's shipment history and a history of the new consignor's shipments. The output data are costs per shipment. The marginal costs of the new consignor are the difference between the distribution costs per shipment with and without the additional shipments (Fig. 2).
The "NCIP Solver" function from Fig. 2 is the core of our proposed methodology and further outlined in Fig. 3. There are four modular, sequential steps in the computation procedure: (1) vehicle routing, (2) cost accounting, (3) cost allocation, and (4) model building. Every module can be modified by the user in order to improve the suitability of its outputs. The idea here is to walk-through the same process twice, first with the incumbent shipment history and second with the combined shipments including the new consignor. In the second walkthrough, the new shipments are incorporated into the routes of the GFF, as if they were integrated already without making any difference between incumbent and new shipments. Applying the same modules twice enables comparability.
In the following subsections, we present our proposed methodology that is developed over many experimental setups and runs. In the description of our procedure, we use the following set of variables: Sets and indices

Set of all incumbent shipments I″
Set of the new consignor's shipments Tour with T t ∈ T which includes a scheduled disjoint subset of shipments from I: T t = {0, … , k, … , 0} and starts and ends at the terminal i = 0 Ton T t Tonnage of tour T t in kg: which is the average number of deliveries per stop: Variable driving costs of tour T t including personnel, fuel, and truck in EUR: Overall variable driving cost of tours in EUR: ICC T t Idle capacity cost of tour T t in EUR:

ICC
Overall idle capacity costs for all tours in EUR: Allocation weight of payload in relation to weight of distance w kg ∈ [0, 1] cv i Allocated variable costs per shipment i in EUR: Idle capacity cost surcharge rate per shipment: cf Fixed cost surcharge per shipment for scheduling/loading in the morning and taxi at drop-off location in EUR: Unloading cost surcharge per shipment i in EUR: Full costs per shipment i: OLS coefficients of the costing model per shipment Estimated variable costs of shipment i with characteristic d 0i and kg i in EUR:

Vehicle routing
The vehicle routing module applies operations research methodology to cluster and route daily distribution tours. The objective function is a minimum function that optimizes either mileage, duration, or costs. In general, this module permits the incorporation of manifold formulation variants from literature on the vehicle routing problem (VRP). Mandziuk (2019) reviews different modern problem formulations and solution methods for variants of the VRP. From a practitioner's view, the problem formulation in the routing module has to ensure the applicability and thus validity of the tours to compare. For example, an LSP who offers time windows to his consignors needs to account for these time windows in vehicle routing. We propose to use a VRP formulation with a homogeneous capacitated fleet and to apply well-known local search heuristics in the solution. Local search heuristics provide an acceptable trade-off between objective quality, computational speed, flexibility, and simplicity (Cordeau et al. 2002). For our purpose, an acceptable objective quality is sufficient, since we are interested in the effect of different inputs rather than different solution methods. Therefore, as long as the same methods are applied to both inputs I′ and I, the solution quality is comparable with respect to these inputs. Figure 4 visualizes our procedure: we initially apply the well-known savings algorithm by Clarke and Wright (1964). The results are then improved by a 2-opt intra-route search heuristic (Bräysy and Gendreau 2005a). Then, the 2-optimal routes are further improved by an inter-route 2-opt* heuristic (Bräysy and Gendreau 2005b).

Cost accounting
This module isolates idle and fixed costs from variable costs per tour. Inputs for the costing module are • tours from the vehicle routing module; • cost coefficients ckm, ct, and cd; • and estimates of parameters se, tx, and ul.
Idle costs arise by underutilization of drivers and trucks. Fixed costs arise through loading and scheduling before starting the distribution and whenever the driver stops (parking and taxi) at a drop-off location to deliver one or more shipments. Costs for unloading are also calculated separately because they are charged directly to distinct shipments. Variable tour costs include the costs per kilometre multiplied with the mileage travelled as well as the costs per hour (truck costs per hour + personnel costs per hour) multiplied with the tour duration.
These variable costs per tour are summed up and are called overall variable costs.
The overall variable costs are allocated to shipments in "Cost allocation".
In addition to variable costs, the idle capacity costs per tour arise. We assume the costs of employing a driver and using a truck to be fixed per day. This means the driver gets paid to work 8 h a day. The truck's planned time-dependent depreciation is also calculated on 8 h of daily usage. Following this, idle capacity costs per tour are the result of unused time per tour multiplied with the costs per hour considering the driver and the truck. For example, a tour having a total duration of 7 h, leaves one hour of idle time during which truck and driver inflict costs that cannot be charged directly on any shipment.
The sum of all idle capacity costs is called the overall idle capacity costs.
The overall idle capacity costs are surcharged onto the variable costs. This surcharge rate is calculated by the ratio of overall idle capacity costs and overall variable costs.
To calculate full costs, the fixed costs are added. We assume the sum of all fixed tour costs and fixed stop costs to be distributed evenly among all shipments. The fixed tour costs are calculated by multiplying the costs per hour and

Cost allocation
The cost allocation module is meant to allocate the variable cost per tour onto the shipments i ∈ T t of that tour. The idle and fixed costs are then surcharged on top of the allocated variable tour costs. The input to this module is the resulting cost vector CV T t from the cost accounting module and the output of the allocation is a cost vector cv i . In literature, there exist many proposals for different allocation methods (AM). The incorrigible problem with the selection of an AM is how to evaluate its outputs. Since there is no observable and well-known correct benchmark result, any AM is to some degree arbitrary and not completely defensible (Thomas 1969(Thomas , 1974. As different AM produce different outputs, economic consequences, and incentives, any AM can be more or less preferable over others in various circumstances. However, several criteria are proposed in the relevant literature (Fishburn and Pollak 1983;Kellner and Otto 2012;Kellner et al. 2014;Young 1994). AMs may be classified as (cf. Kellner et al. 2014) follows: In the context of distribution cost allocation, all three classes of AMs may be applicable, depending on the overall use case. For example, in a tariff-building use case, a shipment-focused AM seems appropriate, in a negotiation context a marginal cost-focused AM may provide a lower bound in pricing, or for transfer pricing within horizontal groupage freight alliance the stability focused AM is more suitable.

Selection of the AM
Herein, we propose to reference literature for recommendations and evaluate these literature-based recommendations with respect to applicability in the distribution case.
The suitable AM must generate a vector of the costs per shipment class cv i , which enables the classification of shipments in a two-dimensional table by weight kg i and distance d 0i (rhs of Fig. 2). The cost vector must be a progressive function of the distance d 0i (Boone and Quisbrock 2010). That effect has been demonstrated by Boone and Quisbrock using a ring-radial model, but it is intuitively explicable (Fig. 5). As the distance increases, the number of shipments that fit onto a tour decreases due to the time restriction and the costs are allocated to less and less shipments, causing progressive costs per shipment. For example, within a 2-stop tour, two equal-weight, equal-distance shipments are delivered to their destinations, which are both 50 km away from the terminal. The driver returns to the depot just in time without violating the allowed driving duration. Therefore, both shipments have equal halves of the tour's variable cost allocated to them: cv 1 = cv 2 = CV(T 0) 2 . If one of these shipments were not 50 but 51 km away from the terminal, the driver would inevitably violate the allowed driving duration and the vehicle routing module would not allow that 2-stop tour. Instead, the now farther away shipment has doubled the costs allocated to it. This marginal cost is much smaller for less distant stops on tours with higher volume.
With respect to payload, we expect the AM to tend to allocate more costs to heavier shipments in a linear manner. The reason is obvious: if the marginal costs per kg were increasing, the consignor has a monetary incentive to split up his shipments.

Experimental implementation of recommended AM
Kellner and Otto (2012) experiment with 15 different AM with respect to a broad set of criteria including robustness, coalition stability, and ease of application. In their paper, the authors consider the allocation of greenhouse gas emissions in one-to-many distribution networks. As they collect AM from literature on cost allocation, we consider their comparison relevant for this paper. They recommend three AM: AM1 Proportional willingness-to-pay (PWTP) from Fishburn and Pollak (1983) AM2 KM and Tons-KM Allocation (KTA), which the authors proposed themselves AM3 Savings cost proportional allocation (SCPA), which we identified as a generalization from Fishburn and Pollak (1983) (1981) Since none of the mentioned contributions comments on the relationship between costs and distance per shipment in distribution, we implemented all four and experimented with shipment data from a German GFF. In the case of KTA, we test several weights w kg of payload. For the allocation in Fig. 6, w kg = 0.3 is set. Figure 6 visualizes some of our experimental results. We apply all AMs for inter-tour cost allocation, which means we allocated the total variable costs onto all shipments. 2 Visually, one can identify that the relation between costs and distance is not progressive for any of the AMs. pWTP and SCPA from Fishburn and Pollak (1983) performed identically and thus confirm that SCPA is a generalization of pWTP. However, both neglect payload. Indeed, the Pearson correlation of allocated costs and payload is 0.07 for both AM1 and AM3. This relation is better captured by KTA and LMA.

AM design
As all of the recommended AM fail to account for the progressive relation between costs and direct distance, we propose to design a new shipment-focused AM. We acknowledge the suitability of marginal cost and stability focused AM for respective use cases. Nevertheless, this paper investigates the true cost of a new consignor's shipments, and thus, we focus on those shipments' characteristics and how they perform in daily routing.
The following shipment-focused AMs are designed in order to produce the expected behaviour.

AM5: proportional normalized tons-km (pnTKM)
The KTA accounts for a combination of tons-km and km. The problem of using tons-km is the scale of payload and distance in distribution. While the distance usually scales from 0.4 to 90 km (with some outliers up to 150 km), the payload ranges from 1 to 4500 kg and thus outweighs the distance. Therefore, we normalize both payload and distance on a scale from 1 to 100. Therefore, every shipment has characteristics, normalized tons-km ntkm i . AM5 pnTKM allocates the variable costs CV of all tours onto the shipments proportional to the normalized tons-km: 6 Experimental results of four recommended AM 2 We also experiment with intra-tour allocation, which means we apply the four AMs to allocate the variable costs per tour onto the shipments of that tour. The results of intra-tour allocation cause excessive variability in the allocated costs among comparable shipments. For example, repetitive homogeneous shipments that are addressed to the same destination on multiple days are routed in different tours. As a result, these homogeneous shipments receive different costs. Therefore, we recommend to allocate variable distribution costs on an aggregated level, e.g. across multiple tours.

AM6: proportional normalized tons-km squared (pnTKM 2 )
In order to stress the progressive impact of the direct distance on costs, we experiment with the idea to square the distance and thus weigh it even more. AM6 pnTKM 2 allocates the variable costs CV onto the shipments of all tours proportionally to the normalized tons-km 2 :

AM7: normalized payload and normalized (distance) a allocation (nPnD a A)
AM7 is inspired by KTA from Kellner and Otto (2012). We propose three modifications: first, we account for payload instead of tons-km in order to not include distance twice. Second, we normalize both terms in order to adapt the scale of payload and distance. Third, we exponentiate distance after that normalization. AM7 nPnD a A allocates the variable costs CV onto the shipments of all tours proportionally to a weighted combination of normalized payload and normalized distance to the power of a: The exponent a should be some number a ≥ 1 . However, the fitting of a is not completely defensible, just as any other parameter of any AM.
The three designed AM 5-7 are implemented and tested with the same GFF data set. In case of nPnD a A, the weighting factor is set to w kg = 0.3 and the exponent is set to a = 1.5 . The results are visualized in the scatter plots of Fig. 7.
In the case of pnTKM and pnTKM 2 , costs depend on both direct distance and payload. The visual effect is a large scatter in both uni-variable plots. In pnTKM 2 , one can observe the progressive trend in the scatter plot (AM6 top). However, the visually most appealing results are achieved by nPnD a A. There is both a clear progressive trend in the cost per km and a supposedly linear trend in the cost per kg. This linear trend gets steeper with increasing weight w kg and vice versa. The more weight is set on distance, the less dispersion can be observed in the costs per km.
Herein, it is not intended to make the case for criteria of fairness, robustness, or neutrality. Therefore, we propose to apply the nPnD a A in the cost allocation module in order to determine the costs per shipment. 3

Model building
To evaluate the shipment specific cost differences between the incumbent structure and the structure including the new consignor, we build a cost table with the dimensions distance and payload. Every cell in that table represents a class of shipments with similar characteristics regarding distance and payload. The sizing of the classes is done in incremental steps of 10 kilometres and 10 kilograms. The scales reach until 200 kilometres or 5000 kilograms. At this point, the costs of every single shipment are calculated, but there are large gaps in the cost table. In practice, this is a problem To build the model, a multiple linear regression is performed as follows: The output vector y is the vector of allocated costs including idle costs per shipment. 4 The matrix X contains the exponentiated normalized distances and the normalized payloads. As a result, the vector of coefficients for exponential normalized distance and the normalized weight β is estimated. Based on that calculation, variable costs per class of shipment characteristics can be estimated with the following equation: As a result of the model building, we get the following cost table (reduced due to clarity) ( Table 2).

Costing comparison
To compare the costs per shipment with and without the new consignor, two cost estimating models are built. With these two models, a detailed comprehension of every shipment's cost with and without the new consignor can be calculated. We search for correlations between the changes regarding the costs and changes in the network. Therefore, we calculate relative changes in the network's characteristics and the change in the costs per shipment. Performing multiple correlation tests helps us understand the interdependencies between all the calculated changes.

Computational analysis
In this section, we introduce the structure of our given data and the assumptions of the computational analysis. After that, our presented methodology is applied to the data of an anonymous German GFF. We analyse the results of our study in order to gain practical insights into the effect of a new consignor within a distribution network.

Data
Our data represent shipments of the distribution structure from a random terminal of a German GFF. The given time period is a recent month. As of confidentiality agreements, the identity of the GFF, the consignors, and the recipients are anonymous. After data cleansing, the data sample includes 3742 shipments with the following attributes listed in Table 3. In addition to the raw data from Table 3, we compute the following attributes per shipment (Table 4).

Assumptions
In order to estimate the costs per shipment, we calculated different cost rates. Truck costs per hour, truck costs per kilometre driven, and drivers' hourly wages are calculated with schemes described by Wittenbrink (2014) and Hartmann (2019). The results are ct = 7.5€ per hour per truck, ctkm = 0.7€ per kilometre driven, and cd = 20.5€ per hour per driver. Our calculations on cost rates are shown in the appendix. Furthermore, we estimated the loading and scheduling time at the beginning of a tour to be se = 1 h . The time for parking and taxi is tx = 0.13 h and the time for unloading is ul = 0.0003 h per kg. These values are estimates from past projects with other GFFs and are thus not characteristic for this specific GFF in our computational study.

Computational result
In order to answer the research question, we implement the proposed methodology from "Methodology: a data-driven approach". Thereafter, we perform multiple runs for the largest consignors in the data sample. A consignor is considered large if his number of shipments or tonnage ranks among the top 25 consignors. After sorting out the two list's duplicates 31 consignors remain.
From a practitioner's view, it is interesting to investigate the most valid predictors of cost differences. Therefore, correlation tests between the changes in the shipment structure and the cost differences are performed. The results are shown in Table 5.

Correlations' significance
As we are interested in the shipments structure's impact on costs, we are focusing on the correlations between the structural changes and the delta in costs per shipment which are represented in the last column of Table 5. Before interpreting the data, we check the p-values of our correlations to sort out non-significant results. Our maximum significance level is p ≤ 0.05 . Discarded correlations are marked by crossedout values. ≤ 0.05, correlation ≥ 0.8) The set of characteristics which induce the strongest absolute correlation to the change of costs per shipment consists of the following: A plausible explanation could be the following logical approach: Less stop-stop distance comes from higher density tours. This means that more stops are performed within the same or less amount of kilometres driven, which is limited by the maximum driving time. To achieve this setting, more shipments overall have to be distributed via the network.

Multicollinearity
As there is a correlation of 1.0 between the characteristics shipments per tour and stops per tour, we can state multicollinearity and view them as an identical feature when it comes to interpretation. In general, there are pretty high correlations between the most important characteristics named above. All of the correlations are in an interval ranging from 0.66 to 0.88, which leads to the guess that one single effect is described by a set of characteristics.

Discussion
The present paper contributes both to the theory and practice of distribution. However, there are also some limitations to our results and plenty of room for further research. The pickup street Consignor zip-code The pickup zip-code Consignor city The pickup city Consignor country The pickup country Recipient no.
The recipient's number Recipient street The drop-off street Recipient zip-code The drop-off zip-code Recipient city The drop-off city Recipient country The drop-off country Payload The shipments payload in kilograms

Theoretical implications
Economies of integration Keeler (1989) introduces the term EOI. This is a major contribution to transportation theory because EOI clearly deviate from Economies of scale. A major contribution of the presented paper is the comprehensive list of indicators that drive EOI in distribution. In "Economies of integration", we identify shipments, payload, certainty, drop factor, tour, and area density as well as approach distance. Furthermore, the correlation matrix of Table 6 highlights the importance of different characteristics of distribution with respect to their impact on costs. For example, it turns out that the overall number of shipments has a greater impact than the overall tonnage.

Relation between costs per shipment and distribution structure
"Computational analysis" section reveals several insights about the relationship between the costs per shipment and other characteristics of the distribution shipment structure. Our analysis shows that the number of stops and shipments per tour decreases the costs per shipment as the coefficient of correlation is smaller than − 0.8. Furthermore, the average stop-to-stop distance and the overall number of shipments in the distribution significantly decrease costs per shipment. We could not show that the drop factor decreases costs significantly, however, this may be the result of an overall low level of drop factors for all consignors in our sample. Summarizing the results from the computational study, we conclude that (1) without a great volume, consignors have only limited impact on the overall cost structure and (2) consignors with a great volume usually reduce the average costs per shipment for all consignors, but (3) the exact level of cost reductions depends on the densification of tours and thus needs meticulous investigation.

Development of a new AM based on recommendations from the literature
In "Cost allocation", we identify recommended AM from relevant literature. However, none of the recommendations succeed in providing a progressive functional relation between distance and costs according to Boone and Quisbrock (2010). The identification of this insufficiency is a contribution itself. Furthermore, based on those recommendations, we propose further development of the recommendations from Kellner and Otto (2012), which incorporates a normalization to overcome different scales and a potential or exponential term to overcome the degressive relation.

Distribution costing and cost-based tariff design
The proposed methodology is of great practical use. Due to its modular design, it is easily adaptable to specific use cases and it may be integrated into an existing IT landscape. Calculations of cost rates (Appendix) and estimates of durations need to be adapted, however, we present a valid starting point for practical usage. The output of our methodology is the well-known structure of transportation tariffs with the dimensions of distance and payload. Therefore, the methodology can be applied for tariff design (Table 7). Table 5 shows the changes in costs per shipment for different consignors. As it turns out, the EOI of a single consignor often decrease the costs per shipment, but not without exceptions. Take for example consignor 14 from Table 5: this consignor adds 2.3% of new shipments to the incumbent shipments. However, the average costs per shipment increase. Reviewing the EOI indicators, we assume that the increase in the average payload per shipment causes an increase in costs. As a result, the 14th consignor is not eligible for a discount, although he accounts for a large volume. Due to our results, we make the following general recommendations for tariff negotiations:

Individual consignor rates and discounts
• Calculate the relative number of shipments added by the new consignor • A consignor who adds less than 1% of shipments to the incumbent shipments, should always pay the standard tariff. • For large consignors, who add more than 1%, a new tariff based on the combined shipment structure should be calculated and the negotiating range should be derived using the proposed methodology. • Discounts should not exceed the reduction of costs per shipment. • New consignors who do not present their shipment structure should always pay the standard tariff and after some period their shipments should be analysed in order to determine their EOI.

Allocation method
It is worthwhile to investigate new AMs which are aligned with the progressive relation that is described by Boone and Quisbrock (2010). We propose a new AM, developed from recommendations from the literature. Cost allocation within the calculated tariff table is heavily dependent on the chosen AM. Modifications in the AM module will modify the outputs and thus the negotiation rates. With that in mind, the costs of a new consignor can be modified by choosing an AM that attaches more or less weight to either distance or payload. For example, choosing an AM which stresses payload is going to increase the allocated costs of a new consignor adding only high payloads to the system. The investigation of AM that account for the progressive relation of cost and distance is an interesting research gap.

Drop factors
As our analysis could not find significant results for the relation between drop factor and costs, this relation is subject to further research deploying data with much higher drop factors. In our sample, no single consignor has a significant impact on the overall drop factor and thus no cost reductions could be observed. From a practitioner's view, we would expect that dedicated GFFs, who serve only consignors from selected industries, e.g. automotive parts, pharmaceuticals, fresh groceries, achieve greater drop factors, as the recipients are often times identical.

Collection and line-Haul
The present paper investigates the distribution of groupage freight. As distribution often accounts for more than 50% of the total costs per shipment, this last mile is the most important part. Nevertheless, a GFF should also investigate the costs per shipment using a similar methodology. Our proposed methodology is easily applicable to the collection in groupage freight. From a costing perspective, line-hauling is the easiest part. Due to this limitation on distribution, we consider a comprehensive model that comprises all of the three legs to be a worthwhile practical contribution.

Hidden consignor clusters
Our analysis looked into the question of the effect of a single new consignor. However, the shipments from different consignors collude: stops and tours are clusters of shipments from many different consignors. It may be the case that some subsets of consignors have great EOI in combination. For example, a producer of paints and a producer of ironmongery always send shipments to the same hardware and DIY stores. Therefore, both consignors have great EOI in combination with each other, even though both consignors neither cooperate nor know about this relation. Further research may extend our work to complementary hidden consignor clusters. From a GFF's perspective, it may be valuable to unveil such hidden clusters in order to (1) make sure none has an incentive to leave the network and (2) acquire new consignors who integrate smoothly with incumbent hidden clusters.

Stochastic inputs and robustness
We consider our given dataset including shipments over 1 month to be representative in terms of the long-term shipment structure. However, there may be consignors whose data cannot be considered deterministic. The new consignor's shipment structure is constantly changing, or-in the worst case-the consignor deceives the GFF with modified shipment data to blend in smoothly into the incumbent structure. Therefore, incorporating robustness against stochasticity or changes in the shipment structure is an important issue in tariff design. One starting point could be an additional safety premium. This safety premium should account for uncertainty in the number and the location of shipments.

Multiple consignor strategy
In case of two or more consignors entering the system at the same time, sequential calculation of the consignor specific negotiating range would always lead to different results depending on the sequence of calculation. This emerges from the fact that adding one consignor's shipments will change the shipment structure and leads to a different baseline for the second consignor. There is no fixed rule that implies whether the second consignor will benefit from the first one and vice versa. An extreme example are two consignors which are not very suitable for the incumbent shipment structure in terms of density. Let us assume their drop-off locations are outside of the LSP's incumbent service area but relatively close to one another. As a result, the first of the two consignors will always decrease density and increase costs per stop. The second consignor will benefit from the first consignor's added stops and increase density as they are very close to each other. As a result, the costs per stop will decrease compared to the shipment structure including stops from the first consignor. Therefore, there are two options to solve this problem.
Option 1 Aggregation of the new consignors' shipments to one structure and then use our methodology to calculate one single negotiating range Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.