Machine-learning assisted modelling of multiple elements for authenticating edible animal blood food

Highlights • The critical elements for identifying species of the animal blood food were selected.• Elemental fingerprint coupled with ELM were proposed for species identification of the animal blood food.• The optimal ELM model for identifying the species of the animal blood food was constructed.• The absolute and relative content of 25 elements in animal blood food were reported for the first time.


Introduction
Animal blood is rich in protein with high biological value, so it is accepted as value-added component of foods or dietary supplements in many societies (Bah, Bekhit, Carne, & McConnell, 2013;Toldrá, Aristoy, Mora, & Reig, 2012). The high economic animal blood food, known as "blood tofu" usually made via simple heating of the fresh edible animal blood is widely distributed in the supermarkets or restaurants of China. Generally, blood from duck, pig and chicken are used for preparing blood tofu. However, in view of its unique taste and texture, duck blood tofu is most popular in China. Therefore, the price of duck blood tofu is usually higher than that of other animal blood tofu, which contributes to food fraud, adulteration, and mislabeling.
Currently, the high economic value of duck blood tofu adulterated or replaced with other low-price animal blood is still a serious social problem implied by media reports and administrative punishment cases (Zhang, Wang, Ma, Li, & Li, 2020). Hence, a method for authentication of animal blood foods is needed. Specific biomarkers, such as DNA (Cheng, He, Huang, Huang, & Zhou, 2014;El-Sayed, Mohamed, Ashry, & Abd El-Rahman, 2010;Unajak et al., 2010) and peptide , have shown great potential for the aforementioned purpose. However, although the specific biomarker-based analytical techniques are accurate, these techniques can be practically problematic due to sample contamination and intentional addition/removal of the biomarkers by the food counterfeiters. Therefore, it is urgent to develop a novel technology to authenticate animal blood foods.
Elemental fingerprint showed a unique advantage to the specific biomarkers-based analytical techniques as it involves all chemical constituents in foods making it difficult for the food counterfeiters to adjust such huge element species and content. Recently, the elemental fingerprint has been used to trace the origin of green coffee beans (Endaye et al., 2020), determine the geographic origin of salmonid (Han, Dong, Li, Wei, Zhou, & Gao, 2020), discriminate geographical origin and species of China's cattle bones (Zhang et al., 2021), and authenticate the geographical origin of Australian Cabernet Sauvignon wines (Ranaweera, Gilmore, Capone, Bastian, & Jeffery, 2021), to name but a few. Nevertheless, to the best of the authors' knowledge, no studies investigating the authentication of animal blood foods based on elemental fingerprints have been reported. Therefore, the present work aims to develop a novel technique for species authentication of the edible animal blood gel (EABG) using elemental fingerprint coupled with machine learning modelling.

Samples preparation
Fresh animal blood samples were collected in Suzhou, China from July 20th to August 7th, 2021. Duck and chicken blood samples were purchased from the vendor for on-site slaughter at a local farmers' market; a local slaughterhouse provided pig and bovine blood samples; the sheep blood samples were purchased from a local hotpot restaurant. All the fresh animal blood samples were transported to the laboratory in an ice-filled box and then sterilized under high temperature at 121 • C for 30 min after natural sedimentation. All blood samples were homogenized separately, and then frozen at − 20 • C. Eventually, thirty blood samples for each species of the animals were collected, thus yielding a total of 150 EABG samples for multi-element measurements.

Multi-element analysis of the EABG
According to Chinese standard GB 5009.268-2016 (National Health and Family Planning Commission of the People's Republic of China, 2016a), trace elements such as lithium (Li), beryllium (Be), boron (B), aluminum (Al), titanium (Ti), vanadium (V), chromium (Cr), manganese (Mn), iron (Fe), cobalt (Co), nickel (Ni), copper (Cu), zinc (Zn), arsenic (As), selenium (Se), rubidium (Rb), strontium (Sr), cadmium (Cd), barium (Ba), thallium (Tl), and lead (Pb) were determined by using inductively coupled plasma mass spectrometry (ICP-MS). The atomic absorption spectroscopy (AAS) was utilized for the macroelement measurements, potassium (K) and sodium (Na) were tested according to GB 5009.91-2017 (National Health and Family Planning Commission of the People's Republic of China, 2017a), calcium (Ca) and magnesium (Mg) were analyzed according to GB 5009.92-2016 (National Health and Family Planning Commission of the People's Republic of China, 2016b) and GB 5009.241-2017 (National Health and Family Planning Commission of the People's Republic of China, 2017b), respectively. All elemental analyses were carried out according to Chinese national standards, which is convenient for potential users to follow.

Chemometrics and software
Herein, the absolute and relative contents of the measured elements were used as the original dataset for machine learning modelling. Because of the big differences in contents of the microelements and macro-elements, Z-score normalization was performed firstly on the original datasets to eliminate the data orders. Then, the stepwise discriminant analysis (SWDA) was compared with one-way analysis of variance (ANOVA) for selecting crucial elements; principal component analysis (PCA) and Fisher linear discriminant analysis (Fisher LDA) were implemented comparatively to reduce dimension; eventually, extreme learning machine (ELM) was selected for modeling due to its simple network structure, good generalization ability and less time consuming (Han, Zhang, Aheto, Feng, & Duan, 2020;Huang, Zhu, & Siew, 2006).
During ELM modelling, the number of hidden neurons and the activation function of the hidden layers were optimized. As the strategy of cut-and-trial was used, the optimal number of hidden neurons was set at a range of [1,100]. Also, the frequently used activation functions for the hidden layers are depicted in the following formulas:.
Sin : Hardlim : where: x means the inputs of these formulas. Performance of the constructed ELM model was evaluated by using the recognition accuracy, which is calculated by dividing the number of correctly predicted samples by the total number of samples in the training or test set. All algorithms in this work were implemented with Windows 10 in Matlab version 7.14 (Mathworks, Natick, USA).

Elements content in different species of EABG
Results of the elemental analysis for the EABG samples are shown in Table 1. According to the results, there was no significant difference in the contents of Li, Be, Ti, Co, As, Cd, Tl, and Pb in EABG from different species of animals. Elements were different between only two species of EABG, but no significant differences with other remaining three species of EABG were B, Al, Mn, and Na, and more details are shown as follows: B content in bovine blood gel was significantly higher than that in sheep Table 1 The absolute content of the elements measured in blood gels prepared from duck, chicken, bovine, pig, and sheep. Results are expressed as mean values ± standard deviation, n = 30. Values in the same line with different superscripts were significantly different (P < 0.05).
blood gel, but similar with that in blood gels made from duck, chicken, and pig; Al content in pig blood gel was significantly higher than that in bovine blood gel, but similar with that in duck, chicken, and sheep blood gels; Mn content in chicken blood gel was found significantly higher than that in bovine blood gel, but similar with that in duck, pig, and sheep blood gels; Na content in bovine blood gel was significantly higher than that in pig blood gel, but similar with that in duck, chicken, and sheep blood gels. As demonstrated in Table 1, elements content in two species of blood gels showed significant differences with other remaining three species animal blood gels were Sr, Ba, Mg, and Ca and more details are shown as follows: Sr content in bovine and sheep blood gels were significantly higher than that in duck, chicken, and pig blood gels; Ba content in bovine and sheep blood gels were also significantly higher than that in duck, chicken, and pig blood gels; Mg content in duck and pig blood gels was significantly higher than that in chicken, bovine, and beef blood gels; Ca content in pig and sheep blood gels was significantly higher than that in duck, chicken, and bovine blood gels.
The Table 1 also shows that Fe content in the EABG samples increased in the following order: chicken, pig and duck; no significant difference was found in Fe content in EABG samples of duck, cow and sheep. Cu content increased in the following order: bovine, pig and duck; it was found that the Cu content was similar between pig and sheep blood gels, and between duck and chicken blood gels. K content found in the EABG samples increased in the following order: pig, duck and bovine; it was found that the K content was similar between duck and chicken blood gels, and between bovine and sheep blood gels. Zn content in bovine blood gel was significantly higher than in other animal blood gels used. V content in bovine blood gel was significantly higher than that in sheep blood gel, but there were no significant differences with other remaining animal blood gels. Cr content in bovine blood gel was significantly higher than that in duck and sheep blood gels, but there was no significant difference with chicken and pig blood gels. Se content found in the EABG samples increased in the following order: pig, sheep and chicken; it was found that the Se content was similar between duck and pig blood gels as well as between sheep and bovine blood gels.
In order to further explore the difference of EABG multi-element distribution, the relative content of the measured elements obtained via single element content divided by the total element content of the sample was also analyzed, and the results are shown in Table 2.

Selection of key elements and dimension reduction for modelling
The SWDA and one-way ANOVA were separately used to select key elements for machine learning modelling. Results of the SWDA showed that regarding the absolute content of the elements used, B, Fe, Ni, Cu, Sr, Na, Mg, K, and Ca were selected as the key variables; In contrast, as the relative content of related measuring elements, eight elements were selected as the key elements, namely Fe, Ni, Cu, Zn, Sr, Na, Mg, and K. Results of the one-way ANOVA showed that all the tested elements except Li, Be, Ti, Co, As, Cd, Tl, and Pb were selected considering the absolute content; and in terms of the relative content, the measured elements other than Be, B, Ti, Co, Tl, and Pb were selected as the key variables for modeling.
Afterwards, PCA and Fisher LDA were utilized comparatively for dimension reduction of the key elements selected. The accumulative contribution rates of the top several principal components (PCs) and discriminate functions (DFs) used were shown in Fig. 1 It could be observed from Fig. 1 that the top 8 PCs and 5 PCs could be used to represent the key elements selected by using ANOVA and SWDA respectively on the absolute content dataset of the measured elements; as well as, the top 9 PCs and 5 PCs could represent the key elements selected via ANOVA and SWDA respectively on the relative content dataset of the measured elements. Fig. 1 also shows that the top 3 DFs could be used for representing these corresponding datasets respectively while SWDA was used.

Results of ELM models
ELM models with different inputs obtained from section 3.2 were constructed and optimized for predicting the species of the EABG. During ELM modelling, one-third of the samples in each group were selected as the prediction set via the Kennard-Stone algorithm (Zhang et al., 2017). The rest samples were utilized as the training set.
According to the knowledge of ELM theory, the input weight and networks bias were generated randomly. Hence, each ELM model was performed 12 times for performance comparison. Table 3 shows the performances of the ELM models constructed for the testing samples. The table indicates that the optimal ELM models were obtained when the one-way ANOVA was used to select key elements and the Fisher LDA was used to reduce dimension. For consideration of the absolute content of the measured elements, while the Sig active function was used, the ELM model offered identification accuracy over 90%; as for the relative content considered, while the Sig and Sin functions were used, both ELM models offered high identification accuracies not lower than 93.0%.
Also, performances of the ELM models with original datasets from the absolute and relative content of the measured elements and the oneway ANOVA and SWDA were compared. Results showed that the performance of the ELM model using Fisher LDA for dimension reduction (88.3 ± 5.35%) was significantly superior to PCA (66.6 ± 8.31%) utilized; However, performance of the ELM model with original datasets from the absolute (76.1 ± 13.9%) and the relative content (78.8 ±  (a-ANOVA on the absolute content using PCA; b-SWDA on the absolute content using PCA; c-ANOVA on the relative content using PCA; d-SWDA on the relative content using PCA; e-ANOVA on the absolute content using Fisher LDA; f-SWDA on the absolute content using Fisher LDA; g-ANOVA on the relative content using Fisher LDA; h-SWDA on the relative content using Fisher LDA). 11.7%) of the measured elements showed no significant difference. As a comparison of the one-way ANOVA and SWDA used for variables selection, the key elements selected using SWDA were included in datasets selected via one-way ANOVA (see part 3.2), resulting in ANOVA datasets containing more specific information of the elements than AWDA datasets for EABG samples' identification.
Eventually, for the 288 tests of ELM modelling, under the condition of using relative content and Sin active function, the best ELM model was obtained with the best neural network structure was 3-12-1, which provided the highest accuracy of 96.0% for the prediction set. It means that only two samples were misclassified of the unknown 50 samples. In training set, there were also only two samples misclassified offering a high prediction accuracy of 98.0%. It is suggest that elemental fingerprints accompanied by ELM have great potential in authenticating the edible animal blood foods.

Conclusions
A method based on elemental fingerprint coupled with machine learning modelling was proposed for identifying the EABG species. Results suggest that: (1) both the absolute and relative content of the elements measured could be used for modelling; (2) Fisher LDA for dimension reduction was significantly better than PCA; (3) the optimal ELM models obtained with the relative content of the measured elements and Sin active function were used, which offered identification accuracies of not less than 96% in the training and test set. It can be concluded that the elemental fingerprint in conjunction with machine learning modeling has great potential in the species authentication of edible animal blood foods. This work presented the multi-element content in EABG for the first time, and developed a method for authenticating EABG species, which can be used to regulate the edible animal blood food market, thereby preventing illegal adulteration and unfair competition.

Ethical Approval
The authors declare that this article does not contain any studies with human or animal subjects.

Informed Consent
Not applicable, as this study does not include any human participants.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Table 3
The identification accuracies of ELM models constructed for the unknown samples with different inputs, namely two original datasets including the absolute and relative content of the elements measured, two techniques for key elements selection including stepwise discriminant analysis (SWDA) and oneway ANOVA, two methods for dimension reduction including principal component analysis (PCA) and Fisher linear discriminate analysis (Fisher LDA), and three activation functions including Hardlim, Sig, and Sin. Results are expressed as mean values ± standard deviation, n = 12. Values in the same column with different superscripts were significantly different (P < 0.05).