Big Data in food safety- A review

The massive rise of Big Data generated from smartphones, social media, Internet of Things (IoT), and multimedia, has produced an overwhelming flow of data in either structured or unstructured format. Big Data technologies are being developed and implemented in the food supply chain that gather and analyse these data. Such technologies demand new approaches in data collection, storage, processing and knowledge extraction. In this article, an overview of the recent developments in Big Data applications in food safety are presented. This review shows that the use of Big Data in food safety remains in its infancy but it is influencing the entire food supply chain. Big Data analysis is used to provide predictive insights in several steps in the food supply chain, support supply chain actors in taking real time decisions, and design the monitoring and sampling strategies. Lastly, the main research challenges that require research efforts are introduced.


Introduction
The term Big Data is used in various ways but always refers to large volumes of different types of data. Big Data is often produced with high velocity from a high number of various types of sources and is demanding new tools and methods, such as powerful processors, software and algorithms to handle it [1]. Big Data applications, although to varying extents, can be found in all steps of the food supply chain from farm to fork to optimise production while maintaining safety and quality standards. The application of Big Data technologies in food safety control remains in its infancy, but recent developments as reviewed by Marvin et al. [1] demonstrate the potential of using these technologies, which may drive future implementations.
In this review, we explore the extent to which these developments (i.e. Marvin et al. [1]) have been expanded within the realm of food safety Big Data and investigated the extent to which Big Data technologies have delivered the promises they had for the food safety domain. To this end, scientific articles reporting Big Data applications in food safety were collected from four different bibliometric databases from the period 2015-2020 using a defined set of Big Data search terms. This search yielded 686 potentially relevant papers of which, after further assessment, resulted in 113 relevant publications (i.e. papers dealing with Big Data in food safety). The main topics and applications reported in these articles are summarized in this review. In addition, a bibliometric network analysis was performed to understand the content of the relevant papers and to investigate relationships between key words. Finally, future challenges were identified that require further exploration.

Literature search
The literature reviewed for this article was collected from four literature databases: Institute of Electrical and Electronics Engineers (IEEE), Science Direct, Scopus, and Google Scholar in the period 2015-2020. In order to improve the accuracy of search results, three groups of search terms were used (see Table 1) which yielded a total 686 papers, of which 113 were relevant (i.e. dealing with Big Data in food safety) and used for this review.

Bibliometric network of Big Data in food safety
The selected relevant articles were investigated using the bibliometric network analysis [2]. Figure 1 shows the cooccurrence network visualisation of the abstracts and titles. In the network, each circle represents a term. The size of a circle indicates the number of publications that have the corresponding term in their titles and abstracts. Terms that co-occur a lot tend to be located close to each other in the network. In the abstracts and titles of the relevant publications, the terms appeared in four significant groups. The green group covers terms related to food, system, product, approach, impact and model. The blue group consists of food safety, analysis, development, Big Data and issue. The red group is more related to technology, application, information and traceability. The yellow group presents the data, strategy and data framework. In the following sections, the content of the selected publications is further investigated based on their content related to the various steps distinguished in a Big Data framework [1]. These steps are i) data sources and collection, ii) infrastructure, and iii) data analysis.

Data sources and data collection
Different types of data sources like (online) databases, internet, omics profiling, sensors [3 ], mobile phones, and social media are the main channels of obtaining data related to food safety [1]. In addition, new technologies have been implemented in smart surveillance systems to collect data related to food safety. Examples of such technologies are video monitoring [4], sensors and portable devices using Internet of Things (IoT) technology [5 ], Geographic Information System (GIS), satellite imagery [6] and blockchain technology [7,8].

Online food safety databases
Marvin et al. [1] provided a comprehensive overview of online food safety databases, which contain information on hazard, exposure and surveillance reports. In the Europe Union (EU), the Rapid Alert System for Food and Feed (RASFF) remains the main food safety online database used by authorities, industry and scientists 1 . Other online food safety databases like United States (US) Import Refusal Report (IRR), Inspection Classification Database (ICD) and China State Administration for Market Regulation (SAMR) alerts have appeared ( Table 2). In the USA, the US Food and drug Administration (FDA) has prepared the IRR system in an effort to provide the public with information on products that have been found to appear in violation of the act. The FDA also conducts inspections, which are reported in ICD. The ICD is used as a tool to search for the final inspection classifications of many firms and project areas. In China, there's still no official open access food inspection database. Few researchers have tried to build a food safety inspection database using data mining based on the official released online food sampling records [9 ], and also, some commercial communities like foodmate 2 updated the related food safety databases according to government publications.

Smartphones and handheld devices
Smartphones and handheld devices have been applied in food safety data collection for various purposes such as quality assessment, food inspection, monitoring, behavior management and food safety information communication [10]. For instance, i) smartphone-based digital image colorimetry has been used to classify milk samples, detect milk adulterants and determine protein content [11]; ii) smartphone images to detect food contaminants [12]; iii) smartphone-based lateral flow imaging system to detect foodborne bacteria in beef and spinach [13 ]; iv) smartphones used as a recorder to collect vendors' food safety behaviors in the market, to determine possible food safety risks and areas in Canada [14]; v) using smartphones to collect real-time quality assessment (i.e. foodborne pathogens growth levels) of food using wireless food label [15]. Other applications of smartphones in food safety can be found in [16], [17] and [18]. As a terminal data receiver and recorder, smartphones have played an important role in data collecting and will continue in future.

Social media
Social media is receiving more attention as a potential source of food safety data [19]. Social media platforms like Facebook, Twitter, YouTube, can be used to collect food safety related discussions, opinions or online questionnaires [20]. Web mining is commonly used to collect and mine social media data. By analysing the sentiments and opinions of customers, social media data can be an efficient and valuable route to promote public awareness, understand the public perceptions and improve clients behaviors in food safety governance [21]. For instance, Chung et al. [22] gathered more than 2.6 M tweets of food Big Data in food safety-A review Jin et al. 25 Table 1 Search terms used in the literature review. 1 food safety 2 big data/big data analysis/big data analytics; big data infrastructure 3.1 food supply chain; agro-product supply chain; food net-chains; agricultural net-chains; agri-sector 3.2 food system; food traceability system; agricultural system 3.3 food monitoring; 3.4 business model; agribusiness; agri-information 3.5 food additive; pesticide residue; food contamination; physical contamination; chemical contamination; physical hazards; chemical hazards; biological contamination; biological hazards; microorganism, bacteria, zoonose 3.6 food supply; food processing; food marketing; food purchase; food distribution; food storage; food handling 3.7 water; meat; fish; poultry; vegetable; fruit; milk; dairy; cereal; egg poisoning cases related to a company incident to study public concerns of food safety after the corporate's apologies.

Satellite imagery
Satellite imagery data can be used to detect crop growth, forecast crop harvest and improve agriculture monitoring systems, thereby helping to improve the quality of agricultural products. In the EU, Sentinel-2 provides open access Landsat satellite image data that can be used in several applications such as agriculture and forestry, food safety, and risk mapping. The application of satellite imagery in food safety has been reported. The FAO GeoNetwork 3 and Landsat imagery database 4 contains layers and grids such as water, fields and climate classification, which can be also used as the basis in food safety monitoring such as the use of satellite images and climatic data to detect plant diseases. The US department of 26 Food Safety Agriculture applied remote sensing technology, and spatial information to detect food contamination 5 and Mateus et al. [23] used satellite imagery as an early warning system for shellfish safety.

Internet of things (IoT)
IoT is the interconnection of all things (e.g. sensors, devices, machines, computing devices) via internet or a communication medium (e.g. Wi-Fi, Bluetooth and RFID). The new technologies that are based on IoT are expected to bring safer, more efficient, and sustainable food chains in the near future. Recently, Bouzembrak et al. [3 ] conducted a review on IoT in food safety and observed that the main applications occurred in food supply chains to trace food products, followed by monitoring of food safety and quality. It seems that the majority of the IoT studies in food safety deals with applications related to high value foods like meat, cold chain products and agricultural products using sensors to monitor mainly temperature, humidity, and location. Their conclusion was that there are successful implementations of this technology in food safety but IoT in food safety is still in the early development, which means that further research and innovation is required to capture the full potential that IoT can offer.
Using IoT, devices like mobile phones, digital camera, sensors can collect and transfer data to centralized data infrastructures via Wi-Fi or other transferring channels to facilitate real time monitoring and control [15,24,25]. Some recent applications in this field are: i) smartphones served as fluorescence device and detection information receiver to quantify the concentrations of the chemical Ochratoxin A in beer [24]; ii) sensors based on smartphone used in perishable supply chain to collect temperature, humidity, GPS location, and image data to monitor food quality and safety [10]; iii) radio frequency-powered sensor to detect total volatile organic compounds in food packages and monitor the variation in food quality [26].

Blockchain technology
The new developments in the use of blockchain technology in the food supply chain are expected to bring safer and more transparent food chains in the near future. The application of the blockchain technology in food safety is limited to traceability but issues such as data integrity and tampering still needs attention [27]. Several actors in the food chain such as Walmart and IBM have demonstrated their interest in blockchain technology applications for track-and-trace products. In 2017, IBM collaborated with a few food producers and retailers to enhance food quality control, food safety management and traceability by leveraging blockchain technology. Kim and Laskowski [28] pointed out that the current track-and-trace systems do not provide transparency on the provenance of goods due to the international-spanning supply chain. They analysed the combination of the IoT ontologies and blockchain technology for a better track-and-trace system. They underlined the fundamental role of ontologies in creating blockchain applications for supply chains. Kumar and Iyengar [29] suggested a system implementation using blockchain to enable full traceability to combat food fraud. Their system aims to provide a complete history of across all five steps in the rice supply chain and automate it using smart contracts. Furthermore, several real applications of the blockchain technology in traceability of food can be found in the following cases: tuna tracking and certification 6 , pork meat traceability 7 and wine traceability 8 .

Data format used in food safety
The data used in food safety ranges from unstructured to highly structured data and are stored as documents in various formats (e.g. txt, JSON). For example, Singh et al.
Big Data in food safety-A review Jin et al. 27 Table 2 Online food safety databases.  [30] collected social media (Twitter) data in TXT and JSON format, and implemented parsing method to extract information from JSON files to CSV files. Song et al. [31 ] stored the food safety incident cases in a relational database, a case with several attributes listed in a row. Alfian et al. [16] collected IoT-generated sensors data from the gateway that has a large unstructured format and continuous generation characteristics and used NoSQL and SQL databases to store the data. Alfian et al. [16] developed a real-time food quality monitoring system which receives sensor data from smartphone and stores it in MongoDB database, which is a flexible relational database and documents can be retrieved based on their contents.

Big Data infrastructure Supercomputing centers used in food safety
Supercomputing has become a necessary tool to tackle challenges associated with Big Data. The US has been devoted to supercomputing for a long time, US Exascale Computing Project (ECP)'s Industry Council was formed in February 2017 to facilitate information exchange between the ECP and the industrial user community. Furthermore, FDA has applied supercomputing to conduct research and support food safety 9 .
The EU also attaches a great importance to the development of supercomputing infrastructures. At the moment, the EU has constructed 8 sites for supercomputing centers to support the major applications in bio-engineering, drug and material design, weather forecasting and climate change 10 .
In China, 7 national supercomputing centers have been built in Tianjin, Jinan, Changsha, Shenzhen, Guangzhou, Wuxi and Zhengzhou. A food safety traceability platform has been developed 11 by the Chinese government, collecting 31 provincial food traceability data and connecting national supercomputing centers. Its aim is to realise food traceability from farm to plate and providing services for food production enterprises, food traceability, security and supervision.  13 . "FDA Fronts Pivotal Life Science Trend in 2020" detailed its technology infrastructure modernization plan and the far-reaching impact on life science 14,15 . In China, the Guzhou food and drug administration issued the food safety cloud system in 2014. Now, it has been built into an intelligent food safety supervision system, internet + the inspection system, traceability certification system and Big Data platform for government enterprises inspection and testing institutions and other social agencies 16,17 [32].

Data analysis
Data analysis is the core of the data processing, and the value contained in the data comes from this step. In the last five years, several types of methods were used to extract knowledge from Big Data in food safety: 1) Content analysis; 2) Econometric analysis; 3) Recommendation System and 4) Machine Learning.

Content analysis
Content analysis is a research method of qualitative data (i. e. words, themes, text). It is applied in food safety to clarify or test the essential food safety facts, reveal the hidden details and predict the trends. Content analysis was carried out on a variety of food safety data, such as food safety incidents [33], food fraud cases [34], food spot check and sample inspection data [35,36,37 ]. Nowadays it is gradually becoming a basic method to describe the background in literature and is often combined with food safety database construction or machine learning [9 ,30,38].

Econometric analysis
Econometric analysis is widely used to study food safety issues in international food trade based on food refusal reports or border inspection cases, like hidden trade protectionism or the impact of import rejection risk of border inspection on food safety. Scientists used econometric analysis to explore the impact of import refusal (i.e. food safety problem) on the reputation and the economy of developing countries [39,40]. Treating food safety as hidden trade barrier is another hot topic. In recent years, many researchers devoted to provide evidences of the non-tariff trade barrier by analysing border sampling cases and import refusals [41][42][43].

Recommendation system
Recommendation system has been widely used in ecommerce for targeting recommendations by filtering consumers' preference, interest or behavior [1,44], but rarely was this approach applied in food safety. Recently, Singh et al. [30] proposed a recommendation system based on social media and Big Data analytics to inform supply-chain (SC) decision makers about issues on food safety and quality using beef supply chain as a case.

Machine Learning (ML)
ML is one of the analysis method used in food safety. Via algorithms and learning from input data, ML tools can build models with high accuracy to identify, predict and make decisions for dealing with complex food safety issues [45 ]. There are several ML methods of such applications which are listed below and explained with corresponding cases. i) The sorting and classification of food to realise quality assessment and management by applying computer vision and deep learning methods [46,47 ,48,49,50]; Anil et al. [46] used a SVM classifier to identify the freshness in mushrooms based on the dynamic vision of "enzymatic browning". Thinh et al. [47 ] applied image processing and artificial neural networks to identify the quality of three commercial mango species. Hossain [54 ,61 ] recognised the complexity of food supply chain, constructed a system approach to identify risk factors and their interactions, aiming to present and forecast the occurrence of hazards in food. Zheng et al. [63] proposed a NN model to predict the public behavior after a food safety incident. Chang et al. [64] presented a food safety alarm system that was developed using RF and DT methods to extract value from food Big Data (i.e. food electronic invoices). iii)Extracting diverse food safety information from digital text data.
Recently, it has received scholars' attention, and relies heavily on machine learning methods [65]. A few researchers showcased examples in this domain. By using over 2.6 M tweets of a company's food poisoning cases and a supervised machine learning approach, Chung et al. [22] found corporate apologies had little influence in removing public concerns of food safety after a crisis; Magalhães et al. [25] processed Portuguese daily basis food safety reports and complaints by using Naive Bayes and Support Vector Machine Classifiers to identify the responsible entity.

The challenges
While it is clear that food safety can benefit from the features that Big Data tools can offer, there are a number of challenges that should be addressed to take full advantage of it [19]. According to most experts, the biggest challenges with the data generated along the food supply chain are related to issues of data fairness (i.e. Findability, Accessibility, Interoperability, and Reusability (FAIR)), data quality and lack of standardization. For example, farmers use different farm management systems, which means the standardization of the farm management data (e.g. variable names) is an issue.
One of the challenges that may have caused the limited uptake of the use of IoT technology in food safety is that the data produced today by IoT devices can be difficult to be interpreted, communicated, and shared because of lack of standardized communication protocols [3 ].
Applying FAIR guiding principles in IoT devices will enable both Internet of Data and services helping data and algorithms to find, talk, and remain available for data sharing and reuse [66]. In addition, several issues are associated with IoT security in food safety, such as inadequate hardware and software security. Any insecure IoT nodes along the food supply chain can be a vulnerable point for the security of the entire IoT system and for the rest of the internet.
Handling Big Data issues are challenging and time consuming that requires a large computational infrastructure to ensure successful data processing and analysis in reasonable time. Although cloud computing has been adopted by many organizations as a solution, research on Big Data in food safety using cloud computing technology remains in its infancy. Several research challenges such as scalability, availability, data integrity, security, privacy and legal issues have not been fully addressed.
The application of blockchain technology in food safety is promising and expected to bring safer and transparent food chains in the near future, but still immature and hard to apply due to its complexity. Currently, Its application in food safety is limited to traceability, but issues such as data integrity and governance still need more attention.

Conclusion
This study conducted an overview of the recent developments in Big Data applications in food safety. This review showed that the main channels of obtaining data related to food safety are online databases, Internet, sensors, mobile phones, and social media. In the last five years, new technologies have been implemented in smart monitoring systems to collect data related to food safety such video monitoring, sensors and portable devices using IoT technology, GIS, satellite imagery, and blockchain technology.
Big Data issues require a large computational infrastructure to ensure successful data processing and analysis in reasonable time. Supercomputing centers, high performance computing infrastructures and cloud computing have been implemented in US, EU, and China to enable research in Big Data. Although these infrastructures have been adopted in other research domains as a solution, research on Big Data in food safety using cloud computing technology is still in its infancy.
Overall, the results showed great potentials of this technology and successful applications have been reported to predict, monitor and control food safety in the food supply chain. It is expected that rapidly many more will follow but it is also clear that, to exploit its full potential, several hurdles must be tackled including societal, governance and technical issues.