A Big Data Decision-making Mechanism for Food Supply Chain

Many companies have captured and analyzed huge volumes of data to improve the decision mechanism of supply chain, this paper presents a big data harvest model that uses big data as inputs to make more informed decisions in the food supply chain. By introducing a method of Bayesian network, this paper integrates sample data and finds a cause-and-effect between data to predict market demand. Then the deduction graph model that translates foods demand into processes and divides processes into tasks and assets is presented, and an example of how big data in the food supply chain can be combined with Bayesian network and deduction graph model to guide production decision. Our conclusions indicate that the decision-making mechanism has vast potential by extracting value from big data.


Introduction
The data have now been woven into every sector of the global economy.Companies focus on capturing relevant information from multiple sources such as suppliers and customers made for a much clear and complete picture of the existing business process [1].Big data analytics helps companies to identify new opportunities and requirements for new products and find ways of new services by integrating large amounts of trading information, real-time and historical information.Now, a complementary trend is under way.Information multiplies and is shared more widely around the world provides the basis for advance analysis of big data and enables us to find out new applications, such as the smartphone app that tells commuters when the next bus will arrive.This tendency carries profound significance for companies, governments, and individuals.
These developments have changed the operation management of the food supply chain beyond recognition.As companies capture, store, search, share and analyze huge volumes of data, radical customization and novel business models will be the new hallmarks of competition.Therefore, the application of big data in the food supply chain has been receiving increasing attention.Taylor & Fearne (2006) regard big data as the pre-requisites for the development of a more synchronised approach to demand and activity analysis for the food supply chain [2].Tien (2012) pointed out that big data analytics is a key support technology to implement mass-customization in food production such as nanomodified and nano-additives [3].Anica-popa (2012) indicates that data sharing in the food supply chain will improve the food quality and safety [4].All the analyses mentioned above show that the importance of big data to the food supply chain can never be denied.
As a result, companies need not only skills but also new perspectives on how big data helps solve problems in the food supply chain due to the exploding data.In this paper, we propose a big data harvest model for the food supply chain.Our intent is to develop a decision-support tool that converts data into sights to make more informed strategic decisions.The paper is organized as follows.We first describe the value of big data in food network and develop a big data harvest model.The model is subsequently tested with an example.Then, we discuss and summarize our findings.
customers have broader access to a massive amount of food information, making a more informed decision.For example, customers can be able to know the price of food ahead of time to decide which to buy.
What is more, it has been estimated that hundreds of billions in value per year could be enabled by the use of big data in food logistic.These data are captured by government, transportation operators, individuals, and third-party data providers.One of the largest potential benefits can be obtained by using big data to enhance the ability to deliver and adapt to customers in real time.Another benefit is that companies can optimize every process step from procurement to producing to marketing by uncovering new insights that are hidden within the data.x Retailers; x Suppliers; x Government; x Third-party data brokers; x Customers x Improved product design and producing; x Efficient store operations; x More targeted marketing and sales; x Offering price transparency; x Considerate postsales services x Privacy concerns; x Lack of understanding ; x Technical challenges Food Logist ics x Government; x Transportation Operators; x Individuals; x Third-party data providers x Technology investments and innovation; x Smarter and faster decision making; x Delivering the optimal experience for the customers; x Cost and privacy concerns; x Technical challenges; x Extent and quality of the available data Generally speaking, there are five main ways to leverage big data in food network that gain insights into opportunities and challenges and have implications on how organization will have to be designed, organized, and managed.
(1) Creating transparency: As big data in food network become more available across sectors, transparency of data drives transformation, increases productivity and leads to informed decision making.
(2) Enabling experimentation to identify anomalies, detect fraud and improve performance: Big data-much of it unstructured or machine-generated-needs to be collected, integrated and analyzed in real time to discover anomalies and fraud that help organizations improve operations and develop services.
(3) Micro-segmentation to customize actions: Big data make it possible to work through various streams of customer data to enable the definition of increasingly finer segments and take precise marketing to meet customers' needs.
(4) Replacing/supporting decision making and data analysing with automated algorithms: Big data analytics and visualization of automated algorithms allows organizations to find unknown patterns that occur in food network in a time-efficient and cost-effective manner.
Innovating new business models, products, and services: Using vast amounts of data provides new perspectives that can fuel innovation in food products and services, such as offering clues about how customers will behave.

Big data harvest model
Although the potential value in big data is tremendous, it is extremely hard for existing analytics to analyze high volume (and variety) of data in real time and produce useful information [6].Although there are many data techniques might help managers to produce a lot of information, they are unfocused, and hence inefficient.So it is imperative to provide an analytical framework for structures and links various streams of data to create a coherent picture of a particular problem -so that a better insight into the issue been analyzed and could be gained.
Therefore, we propose a better analytic infrastructure to make use of the available big data to gain competitive advantages in food network management (please see Fig. 1).Firstly, we identify the products that could meet future markets from big data analytics; then, we translate products demand into processes and divide processes into tasks and assets; finally, we meet the market demand through chain coordination and continuous evaluation.For the first part, there are so many methods such as the Delphi method, time series analysis, regression analysis to predict the market demand for food.These methods mainly use historical data to forecast the market demand, but market demand depends on a variety of complex factors, including service quality, consumer groups and government policy.Moreover, these factors can be obtained from big data.If we adopt these factors into consideration, we can improve the precision of prediction to ensure product success.Because Bayesian networks can make effective use of all available data, diagnose what causes high preference and incorporate expert knowledge by representing the relationship among a set of variables [7,22], so that we use the Bayesian networks which link various streams of data in food chain to predict the market demand.Anderson et al. (2004) regard Bayesian network methodology as the implementation mechanism for causal modelling and build a Bayesian network model of customer service satisfaction [8].Corney (2000) applies Bayesian networks to a typical food design problem and the results show that they are powerful tools to aid consumer preference modelling from a combination of data and expertise [9].Further applications of Bayesian networks in food production include food security, food risk and consumer behaviours [10][11][12].
The structure of a Bayesian network is a directed graphical model in which nodes mean random variables of interest and directed arcs represent direct causal or influential relation between nodes [13].Each node X has a probability , we can factor out joint probability distribution: (1) In particular, production decision will be provided by calculating and analyzing the Bayesian network which is set up based on the big data in the food supply chain.An analytical framework is presented in Fig. 2.

Fig. 2. An analytical framework based on big data in food supply chain
In the second part, to the products demand that the first part is analyzed, we probe into an analytic technique that translates products demand into processes and divides processes into tasks and assets.Li (1994) proposes an analytic technique called deduction graph model that allows firms to incorporate their own competence sets with other firms [14].It provides a sequence of optimized expanding process in a visual way by linking different competence sets from various sources [15].Although this approach has not been adopted in big data analytic area, we have developed it and make it possible to provide the right analytic capabilities to help firms to produce a detailed process design to enhance food supply chain innovation.Our aim is to develop an optimization model to extract value from big data to improve food supply chain performance, which can also help incorporating capabilities and information (big data) of group decision makers to maximize big data benefits.The following sections describe the detailed application of the proposed analytic approach in a food company.

Construction of a Bayesian network
A food company is keen to explore how to make use of the value from big data to acquire potential value and enhance their supply chain performance.The company analyzes the market demand through the use of the Bayesian networks.A brief description of the steps is represented below.The first step is sample selection.Big data in the food supply chain implies a large number of behaviour patterns related to consumers' preferences.In order to select an appropriate sample data from the big data in the food supply chain, the company, combined with prior knowledge from food market, identifies and describes the factors that affect the market demands under advice from experts and decision makers.These factors mainly include food attributes and the chemical and physical properties of the product related to these attributes [16].Once these factors are determined, they will be the nodes in the Bayesian network.Based on these factors, the company collects m representative consumers, where each consumer contains a value assignment for each factor.
The second step is to pre-processing the sample data.The values of factors need to be discrete by adopting the clustering algorithm or hierarchical category before modelling in order for the propagation and inference algorithms in the next couple of sections.
The third step is designed to build a Bayesian network.Building a Bayesian network includes two parts.One is to identify the network structure.The other is to determine the conditional probability table.A selection of search algorithms which can be used in learning of the Bayesian networks is shown in Table 2 [17].Figure 3 presents a part of a Bayesian Network for food preference, though it will be more complex in practice [12].And it is built by using the K2 algorithm [18].Given the data D that the previous parts are processed, a Bayesian network is set up that maximizes ( | ) h P S D .

S
. Let [ denote the two assumptions: 1) there is no missing values; 2) ij T is independent and has Dirichlet prior distribution.Therefore, The fourth step is to forecast the market demand.A Bayesian network is a bi-direction inference method where inputs can predict the outputs and vice versa [19].So given the values of the observed nodes, the company calculates the probability distribution of the target nodes to predict the demand for food or diagnose the likely causes of a perfect product.Then, the company identifies what kinds of food can satisfy most customers' preferences and have vast potential for future development.
Moreover, Bayesian network can be further updated to respond to the changing market demand.When new data are obtained, the company can continuously refine the Bayesian network by modifying some local part of it, so that the company is able to quickly change existing running processes to satisfy the customer requirements.

Deduction Graph Model
Specifically, the company identifies five different types of foods that will satisfy most customers' preferences through the Bayesian network analysis.The identified products are: A, B, C, D, E. The company also identified the features of the foods and the relevant production processes (raw materials, machines, skills and so on) needed to manufacture the five different foods i.e. a, b, c, d, e, f, g, h, i, j, k, l, m, and n, with each of a, b, c, d, e, f, g, h, i, j, k, l, m and n representing a unique required production process, respectively.
Specifically, different types of foods require different production processes to produce.Table 3 shows the needed production processes to make a specific product.For example, to produce C will require d, h, m and n.
Having identified the required production processes for different products, both factory managers are asked to point out the existing production processes available in departments A and B. The existing production processes of department A ( ) are identified as: c, d and e. Whereas, the existing production processes of department B ( ) are: a, b and f.
A quick analysis shows that both departments A and B don't have all the required production processes to produce the five newly identified foods.Thus, to make foods that require new production processes, the departments should purchase the production processes from other departments or expand its existing production processes.The selling price for production processes in each department is estimated in Table 4.For example, the selling price for production process c in department A is 1 unit, and 1.5 units for production process f in department B.  Based on the selling price, the expanding cost for department A is shown in Table 5 (a), and for department B in Table 5 (b).The expanding cost for buying new production processes takes into account of the time, labour, energy, funds and so on.There are also compound nodes, such as d ^ e and a ^ b.In order to produce the new foods, the needed production processes will be obtained by learning from existing production processes or by purchasing from other departments directly.Based on the above analysis the two manufacturing departments should focus on different product families.From the production processes learning costs, we can figure that department A is more suitable to manufacture A, B and C, whereas, the department B should responsible for D and E producing.Table 4 shows the foods to be produced in departments A and B. In the Table 6, A, B and C are denoted as X1, X2, X3 respectively, whereas D and E are denoted as Y1 and Y2.The possible earning revenue for a different product mix is listed in Table 7.For instance, if department A makes food X1 and department B makes food Y1, the possible profit earned by A is 4.5 and the possible profit earned by B is 3.The assumption is that both departments are willing to collaborate.They are ready to communicate to achieve the entire maximum profit.

The Competence Network
A production processes network can brightly depict the possible ways of expanding a production process to manufacture new foods [14].The network developed in this case contains compound nodes and considers a cyclical situation.Fig. 4(a) shows the expanding process of department A to produce X1, X2, or X3, and Fig. 4(b) shows the expanding process of department B to produce Y1 or Y2 based on its current production processes a, b, and f.Each node represents each production processes.The arc shows there is a connection between the two nodes, such as, a o c means production process c can be learned from production process a.As for d and m, there is no arc between these nodes, denoting that to learn d from m or to learn m from d is almost impossible.The number on the arc means the cost spent on obtaining the production processes.There are also compound nodes, such as d ^ e and a ^ b.The compound node can only be used when the decomposed nodes are obtained.In order to produce the new foods, the needed production processes will be obtained by learning from existing production processes or by purchasing from other departments directly.For example, in Figure 4(a), the production process f can be learned from production processes e, c, and d ^ e with the cost of 2.5, 2, and 1, respectively.But A also can purchase production process f from department B with the cost of 1.5.Also, e o f o g o i shows the learning sequence indicating that the learning process starts from e, learns f, then learns g, and then leans i from g.The final objective of the production process network is using optimization way to find the best sequence with the highest profit in food production.

Network flow approach
The results of the example problem can be formulated as the linear mixed 0-1 optimization model [14].However, when the size of the problem is increasing, mixed 0-1 programming is not running quickly enough.Kim & Hooker (2002) indicate that a minimum-cost flow problem is already well suited for mixed 0-1 programming and can be solved better and faster with its advantage increasing with problem size [19].So we translated the deduction graph model into a minimum-cost flow problem to find an optimal solution.
Let S is the set of department's existing production processes, T is the set of required production processes for products, I is the set of intermediate production processes.We define a directed graph =( , ) G V E , V S I T .Given a node i in =( , ) G V E , ( , ) r i j is the arc connecting j from i , and ( , ) w i j is the corresponding cost of obtaining j from i .
if r i j r s s r t t w i j w i j otherwise ® ¯ And ( , ) f i j denotes the flow of ( , ) r i j , then the minimum cost flow model is given by the following: ( , ) ' min ( , ) ( , )

( , ) ( , ) f i j c i j d d
(7) There are many effective algorithms for solving the minimum cost flow problem [20].After the optimal solution is found, the subgraph ' G in which the flow of each arc is positive is the approximate solution of the problem.Based on the subgraph ' G , the company, combined with the earning revenue for a different product mix, selects an optimal combination of production strategies to obtain the largest benefits.
To solve the above problem of the food company by transforming into a minimum-cost flow problem [21], the solution listed in Table 8 is that department A chooses X3 and department B chooses Y2 respectively.Fig. 5(a) and 5(b) show the corresponding deduction graph.Take X3, for instance, the required production processes set for producing X3 are ^, , d m n in which processes are all expanded by department A itself at the cost 3.5.The profit of producing X3 is  Moreover, deduction graph model can be expanded to solve multi-level food quality problems, in which there are multi-levels of proficiency for the foods taken into deliberation.For example, the food X1 may have multi-quality levels designated by X11(Normal), X12(Good), X13(Excellent).Likewise, the food X2, X3, Y1 and Y2 may have multiquality levels (i.e., Y11(Normal), Y12(Good), Y13(Excellent)).Each different level of food quality may lead to different results of reputation, intension of government's supervision and customer satisfaction.In this way, the proposed model can help us to select a feasible way, so that the expansion from the initial production processes to final products (five identified foods) can be reached at the lowest cost and the optimum proficiency of the food quality.
Furthermore, this model can be further developed to an optimization approach of incorporating information/skills/service/products (big data) of group decision makers to reap the entire maximum profit.In this way, it works well in cyclic situations and can be used in analyzing efficient information transmitting control of the food network.Big data analytics in food network makes it possible to discover needs and create value, which has implications on how organization will have to be designed, organized and managed.Hence, we develop a big data harvest model that links large amounts of data to create a coherent picture of a particular problem--having identified the products that can meet future markets from big data and then identified the required production processes to produce the products.

Discussion
On one hand, comparing with other analytical approaches, Bayesian networks have a number of features that make them suitable for demand forecasting.The results indicate Bayesian networks are valuable tools for representing the relationship among a set of variables from a combination of big data and expertise in the food supply chain.Through Bayesian network analysis, the food company can build a Bayesian network for food preference, find the types and features that a food product must have in order to be preferred and decide what to produce.
On the other hand, once the company identifies the types of foods that can meet future markets, the next steps the company must translate products demand into production processes and divide processes into tasks and assets.We develop the deduction graph model and make it possible to provide the exact analytic capabilities to help firms to produce a detailed process design.The results indicate that the deduction graph model can effectively help the food company to select the product produced by each department and combine departments' respective production processes to make such products to maximize their profits.The results also indicate that network flow approach can be used to find the optimal solution of the deduction graph model with fast specialized algorithms.The optimal solution is that department A produces X3 and department B produces Y2, the corresponding profit, respectively, is 4.5 and 1.5.

Conclusions
In this paper, we propose a big data harvest model that converted data into sights to gain competitive advantages in food supply chain management.The purposes of this study are twofold.One of the goals is to use big data in the food supply chain as inputs to make production decisions.The other is to apply the deduction graph model to translate products demand into processes and divide processes into tasks and assets.First, using Bayesian network can integrate the prior information and sample information in the food supply chain and find a cause-and-effect relationship between data to effectively predict the market demand and direct food production.
Second, the results indicate a deduction graph model is capable of incorporating production processes of departments to realize the profit maximization.In order to find the optimal solution, the deduction graph model can be translated into a minimum-cost flow problem.We simply illustrate the application framework of using big data to make more informed production decisions in the food supply chain, however, it is necessary to provide technological support such as information-gathering techniques and Bayesian network inference techniques when the company plans and implements the application framework.What's more, the application of big data in other areas of the food supply chain should be addressed through further research.

Fig. 1 .
Fig. 1.Decision making based on big data in the food supply chain

S 2 (
which expresses the uncertainty of the interdependence of the variables, where X S is the parent set of X ( ( )= X S if the node X has no parents).Therefore, together with the independence assumption, for a Bayesian network consisting of n nodes 1

Fig. 3 .(S
Fig. 3. Part of a Bayesian network for food preference Based on the Bayesian theorem, then we have

s and s S , connect 0 t
and t T and get a new directed graph =( ', ') G V E .Let n is the value of | | T .So the capacity and the costs of edges

Table 1 .
Big Data in Food Network

Table 2 .
Search techniques of learning the network

Preference Sugars Sweetness Carotenoids Colour Proteins Particle size Texture
1 ,2, , 1 ,2 , ; 1 ,2, ; 1 , 2, ; 1 ,2, Where a variable i x has i r possible value, i S is the parent of i x ij kN is the number of consumers in D when = .Once such the network structure is found, numerical conditional probability table should be determined.Let ijk T

Table 3 .
Different production processes required by products(" √" means required)

Table 4 .
The selling price for each production process

Table 5 (a).
Production process expanding cost for A (A owns competence c, d, and e)

Table 5 (
b). Production processes expanding cost for B (B owns competence a, b, and f)

Table 6 .
Products in departments A and B

Table 7 .
Revenues for mixed product set

Table 8 .
Solutions of the example problem