Mathematical Modeling of the Film Influence on the Salting Time of Mozzarella Cheese in a Static and Dynamic System: Application of Artificial Neural Networks of the Multilayer Perceptron Type

The NaCl and KCl diffusion in the film formed on the cheese surface during salting was simulated by the finite element method. The time and salts concentration values on the cheese surface were determined, tabulated, and presented to the multilayer perceptron neural network (MLP) for the regression modeling. The samples were divided into 70, 15 and 15% for training, testing, and validation, respectively. The networks with the best performance showed 5 to 12 hidden layers. The Tukey’s test showed that there was no significant difference, at the 5% level, between the time value used and the mean value modeled for training, testing, and validation for the NaCl. For the KCl, a significant difference was observed only for 2 training samples and 1 test sample. Sensitivity analysis showed that the discrete variable Z, which represents the static and dynamic systems, was the most important in the models’ construction.


Introduction
Mozzarella cheese is one of the most consumed cheeses in the world. Initially, it was produced from buffalo milk, but nowadays bovine milk is also used. It presents high nutritional value, has several shapes, filament characteristics, and is used as an ingredient for pizza, lasagna, etc. In Brazil, it is the type of cheese with the highest production, corresponding to 28.4%. [1][2][3] Because of the considerable amount of water in the mozzarella cheese, 45-52%, 2 the salting process is carried out using sodium chloride to prevent bacterial proliferation. NaCl is a determining factor in biochemical, flavor, and aroma changes in cheese. 4 Recently, some studies 5,6 have demonstrated the effect of volatile organic compounds on the aromatic characteristics of buffalo mozzarella cheese and the impact of the cheese structure on aspects of cardiovascular health. Therefore, its excessive consumption can increase the consumers' blood pressure. Thus, the search for products with less sodium content and without major sensory changes is increasing. 7,8 Potassium chloride is an excellent partial substitute for sodium chloride since high potassium intake increases sodium excretion by the kidneys, resulting in an antihypertensive effect. 9,10 Among the types of cheese salting is the dry salting, covering the dough directly with salt, and immersion in a saturated solution, which with the industrial advance and research led to a reduction in the process time. Therefore, salting is an important process in cheese production, consisting of immersion in static or dynamic brine in which the salt is spread in the solid by diffusion mechanisms. 4, 9 As the mozzarella cheese salting process consists of ions transfer into the food through the concentration gradient between the brine and the biosolid, many models of water loss and solute gain are based on the hypothesis that the mass transfer can be described by Fick's diffusion equation (2 nd law) in a non-stationary regime. 11,12 When several solutes are diffused, Fick's generalized law is assumed, since, in addition to the main diffusion coefficient of each solute, the cross-diffusion coefficients that unite the influence of one solute on the flow of the other solute are also needed. 13 However, when a fluid is in contact with a solid surface, a film is formed. If there is a mass transfer between the surface and the fluid, the current has to pass through the stationary layer that acts as a resistance. 14,15 Therefore, this diffusion process can be composed of a series of mass transfer mechanisms and it is necessary to take into account the resistance to both internal and external mass diffusivity. 7,14,15 When it comes to food biosolids, the film diffusion and formation on the surface may be dependent on its geometry and morphology. In this sense, some data analysis tools are necessary to better understand the effects of these factors during the process. 16 One of these tools is the artificial neural network of the multilayer perceptron type (MLP), used for pattern classification 16 and, more recently, used for planning and management of water quality and to predict total dissolved solids in the water. 17,18 Its architecture consists of an input layer with a neuron for each variable used, one or more intermediate layers of neurons, forming decision boundaries, and an output layer that depends on how many parameters will be classified and how they will be represented. 16,19 The objective of this work was to apply neural networks of the MLP, to study the diffusion behavior of sodium and potassium in the film formed on the mozzarella cheese surface during salting by immersion, using the salting time, salt concentration, and static and dynamic brine as input data, and time estimated by the model as output data.

Mozzarella cheese brine
A solution containing 15 L of brine with 5% (m/v) of salts was prepared. The salt composition was divided into 30% of KCl (KCl, Synth, Diadema, Brazil) and 70% of NaCl (NaCl, Panreac, Barcelona, Spain), according to Borsato et al. 10 To ensure a constant salt concentration during the salting process, a volume of approximately 30 times greater than the cheese samples was used. The samples were arranged in a holder, composed of nylon wires, and submerged since the mozzarella cheese was denser than the brine. The diffusion processes were performed in stirred and dynamic brine, the latter with a solution flow of 520 L h -1 , at a constant temperature of 20 °C (± 1 °C), according to Bordin et al. 7 Determination of sodium and potassium chloride The concentration of sodium and potassium chloride in the mozzarella cheese samples was measured according to Bordin et al.,7 with modifications, using a photometer, model B-462 (São Paulo, Brazil, Micronal), with an air pressure of 0.8 kgf cm -2 and 1.5 kgf cm -2 air pump pressure, using butane gas.

Finite element method simulation
The simulation was performed using the COMSOL Multiphysics ® software version 5.2 (COMSOL, Inc., Burlington, MA). 20 The parameters used in the simulation were: main coefficients (D 11NaCl = 1.12 × 10 -9 m 2 s -1 and D 22KCl = 0.91 × 10 -9 m 2 s -1 ), cross diffusion coefficients (D 12NaCl =1.70 × 10 -10 m 2 s -1 and D 21KCl = 1.69 × 10 -10 m 2 s -1 ), m a s s t r a n s f e r c o e ffi c i e n t s f o r s t a t i c b r i n e (h NaCl = 1.48 × 10 -6 m s -1 and h KCl = 1.20 × 10 -6 m s -1 ) and mass transfer coefficients for dynamic brine (h NaCl = 4.48 × 10 -6 m s -1 and h KCl = 3.64 × 10 -6 m s -1 ). 11 For the simulation, the salting times chosen were: 0, 0.25, 0.75, 1.0, 2.5, 6.5, 7.5, 9.5, 10.5, 11.5, 17.5, 19.0 h. Figure 1 presents the solid generated automatically by the software, showing part of the tetrahedral mesh used, and the equidistant points chosen for the study of NaCl and KCl diffusion on the surface of the film formed in the mozzarella cheese.

Artificial neural networks
To analyze the influence of the film formed on the cheese surface it was used the multilayer perceptron network (MLP) of the artificial neural network module of the software Statistica 13.4. 21 The salting time (hours) was chosen as the continuous target variable, the salt concentrations (mol m -3 ) at points P1-P7 ( Figure 1) were selected as dependent variables and as categorical variables were chosen the static (Z = 1) and dynamic (Z = 2) system.
For the network training, 200 epochs, a learning rate of 0.05, and a random subdivision of the samples were used, in three groups: 70% for training, 15% for testing, and 15% for validation. The choice of a higher percentage for training is because the performance of neural networks is measured by their generalizability in predicting data that were not used in training, which is the main concern when training neural networks. 19,21 The algorithms used for activating the hidden layer and the output were selected by the application among those that make up its library for the module used, that is, identity, logistic (logistic sigmoid), hyperbolic tangent, sine, and exponential. 21

Results and Discussion
To verify the influence of the film formed in the biosolid/solution interface, the mozzarella cheese was salted with stirring, in the same concentrations of NaCl and KCl used in the brine without stirring. The cheese samples were collected at the chosen times and the experimental data of NaCl and KCl concentrations were determined.
In the finite element simulation, the same main and cross diffusion coefficients obtained when using brine without stirring were used, as according to Borsato et al., 10 and Bordin et al., 7 these parameters are independent of whether or not there is agitation in the brine since the diffusion coefficients are related to the mass transfer inside the biosolid. However, the mass transfer coefficients in the film are different for the two types of processes, resulting in different concentrations depending on the type of system used. With experimental and simulated data of the salt concentrations throughout the salting process, it was possible to establish the concentration profile. Figure 2 shows the experimental and simulated data.
From Figure 2, the diffusion profile is observed comparing the experimental and simulated concentrations over time (in mol of the m -3 salt of aqueous solution) found in the mozzarella cheese, in the static and dynamic systems. During the first 5 h of the diffusion process, it is already possible to observe that the brine salts diffuse more quickly in the stirring system ( Figure 2b). From Figure 2a it is possible to observe that the static system showed a greater influence of a resistive film on the surface, taking a long time for the salts to enter into the biosolid.
The stabilization of the sodium chloride concentrations starts before 17 h in the stirred brine and 19 h in the static. In addition, these times were also sufficient to establish the balance of potassium chloride, both in the static and dynamic systems, making the salts concentration within the cheese very close to the concentration of the brine used. With the diffusion coefficients (D) and mass transfer coefficients (h) data, it was possible to simulate by finite elements the salts concentration on the cheese surface at the points established in Figure 1. Figures 3 and 4 show the concentration profile of sodium chloride and potassium chloride, at points P1-P13 ( Figure 1) on the cheese surface, during 19.0 and 17.5 h of static and stirring salting, respectively. As the distance among the points was the same, and considering the symmetry and the reduction of computational time, only points P1-P7 were considered in the analysis.
The zero-point concentration corresponds to the salts' initial concentration studied in the mozzarella cheese sample before the salting process. For both static and dynamic systems, the salts concentration throughout the salting time was higher at point P1 and lower at point P7. The diffusion profile in static and dynamic systems, for NaCl and KCl (Figures 3 and 4), shows that there is a physical barrier (film) on the external surface of the cheese, since it was observed that the mass transfer was faster when the system with stirring was applied. In addition, the influence of the film formed on the surface was not eliminated even with agitation, otherwise, the concentration of salts on the surface of the solid, after contact with the solution, would instantly reach equilibrium at all points.
The pump used to maintain system agitation had a flow of 520 L h -1 and, although the flow used does not eliminate the film influence on the surface, the use of a pump with greater power could disintegrate the samples during the salting process, impairing the simulation and consequently the process analysis. However, as the salting time increases, these values tend to approximate the initial brine concentration used, which was 598.29 mol m -3 of solution for NaCl and 201.34 mol m -3 of solution for KCl.
We can observe that at point P1 the salts concentration, in the two systems used, increases from 0.75 h of salting. In other points, the increase in the salt's concentration on the mozzarella cheese surface is observed after 2.5 h, being always higher at the end and smaller towards the central point of the biosolid surface, represented by the point P7. This fact can be explained because the film forms more easily on regular and flat surfaces. The film formation occurs on all surfaces, but its thickness is not the same in all positions: it is smaller at the ends and larger at the center due to the greater curvature radius in that position. Since the film acts as a physical barrier, the greater the thickness, the greater the resistance to the salts' diffusion, and the longer the time required to achieve a concentration balance on the surface. 16,22 To assess the diffusion behavior on the cheese surface, the concentration values (Figures 3 and 4) were tabulated and presented to the automatic regression module of the software Statistica 13.4. 21 In the regression module, the multilayer perceptron neural network (MLP) was used, testing 4 to 12 hidden layers. Because they act characteristically as detectors, hidden neurons play an important role in the operation of a perceptron network learning by backpropagation. As the learning process progresses, the hidden neurons gradually discover the peculiarities that characterize the training data. 19,23 The activation functions evaluated for the hidden and output neurons were identity, logistic, hyperbolic  tangents, sine, and exponential, which are the only functions provided by the regression module of the automated network search (ANS) of the Statistica 13.4 software. 21 100 networks were trained and the top 5 were selected by the application employed. Neural networks are highly nonlinear tools that are usually trained using iterative techniques. The most recommended algorithm for training neural networks is the BFGS, individually proposed by Broyden-Fletcher-Goldfarb-Shanno. 23 This method performs significantly better than more traditional algorithms, such as the gradient method, but it uses more memory and requires longer computational time. However, this technique may require fewer iterations to train a neural network due to its rapid rate of convergence. 19,23 Therefore, before the network initialization, the sum of the squares error (SOS) function was selected and the training algorithm used was the BFGS.
The networks were trained with 70% of the samples for the training group, 15% for testing, and 15% for validation. The sample choice in each group was carried out randomly. The validation step aims to verify the trained network capacity to perform generalizations since artificial neural networks learn a rule using the training examples.
The number of epochs cannot be very high because when a neural network learns many examples of inputoutput it can end up memorizing the training data. This phenomenon is known as overtraining and causes the network to lose its generalization ability. 22,23 According to Haykin,19 the lower the learning rate parameter, the smaller the variations in the interaction synaptic weights with another network, and the smoother the weight trajectory. On the other hand, if the learning rate is very high, the changes will result in large synaptic weights, which can make the network unstable. Because of that, a learning rate of 0.05 was applied with a maximum number of epochs equal to 200. The strategy to create the predictive model was to use the automated network search (ANS) from the software Statistica 13.4, 21 with the decay weight in the hidden layer and output layer ranging from 10 -4 to 10 -3 . Table 1 shows the maximum, minimum, mean, and standard deviation (StdD) values of NaCl and KCl concentrations (mol m -3 ) at each point P and the time (hours) used by the perceptron networks for the training, testing, and validation that were chosen by the software randomly. The values in bold are related to the KCl salting process. The lowest and highest standard deviations were observed at points P1 and P3 for the samples used in training and validation, and P1 and P7 for those used for the network test. The variation observed in the values of the standard deviations of the sample's concentrations used for training, testing and validation is related to the maximum and minimum values chosen by the network and also to the fact that the choice is made randomly.
To assess the performance of the neural network during the iterative execution process, the sum of squares error function was used during training to measure how close the network's predictions are to the chosen target and, therefore, how much weight adjustment should be applied by the training algorithm in each iteration. The error was determined at each training epoch and the information was used to adjust the weights to reduce the error until stabilization. The performance of training, testing, and validation ranged from 0.96 to 0.99 and the error found for training ranged from 4.40 × 10 -4 to 1.12 × 10 -3 , for the test it ranged from 1.05 × 10 -4 to 8.00 × 10 -4 and for validation it ranged from 4.75 × 10 -4 to 1.00 × 10 -2 .
If the error in a regression problem does not change, it is an indication that a solution has been found. It assumes values between 0 and 1 with 0 being the best since it means zero training or test error. 19,21 Figures 5a and 5b provide a general indication of how training and test are progressing. They show the number of epochs used to train and test the network with the best performance, revealing that the network needed only 70 epochs to achieve training stability for the NaCl concentration data (Figure 5a) and 103 epochs for training the network with KCl concentration data (Figure 5b) on the cheese surface. An oscillation was also observed in the first 40 training and test epochs.
The sensitivity analysis allows us to evaluate the contribution of each variable in the construction of the predictive models. Therefore, taking into account the 5 trained networks chosen, it was possible to stipulate an order of importance for each input variable in the general model's adjustment. The sensitivity analysis showed that the discrete variable Z, which characterizes whether the system used was static or dynamic, was the most important in the models' construction to predict the behavior of the salting time (target variable) in the film formed on the surface of the cheese, in the two systems studied. The general order of importance in the models' construction for the NaCl was Z > P1 > P7 > P6 > P5 > P4 > P3 > P2 and for the KCl samples was Z > P7 > P6 > P5 > P4 > P1 > P3 > P2. Table 2 presents the salting times values predicted by the models obtained using the MLP networks, at each point P of the cheese surface, using the Statistica 13.4 21 regression module as well as the mean value, standard deviation, standard error, the value of the p statistic obtained using the Tukey's test applied to the mean, and the value of the p statistic using the Levene's test. The values in bold shown in Table 2 are related to the KCl salting process and those that are not in bold are related to the NaCl salting process.   The perceptron networks with the best performance presented 9 and 11 hidden layers for the NaCl salting time, and from 5 to 12 hidden layers for the KCl salting time. In the network representation, the first number refers to the number of input data, the second to the number of hidden layers, and the last to the number of outputs, which, in this case, refers to the salting time. Networks with the same architecture differ from each other by the activation function used in the hidden layer and the output activation function.
The performance of neural networks is measured by their ability to predict unseen data, that is, values that were StdD StdE pT pL 9-11-1 9-5-1 9-9-1 9-9-1 9-9-1 9-12-1 9-11-1 9-12-1 9-11-1 9-6-1 not used during training. Thus, random test samples chosen by the software were used to verify the model performance and its generalization ability. To avoid just a coincidence in the test results, a set of unseen validation data was used as an extra model performance verification. 19,23,24 The Tukey's test showed no significant difference at the 5% level, between the mean of the times obtained by the 5 networks with the best performance for NaCl. For the diffusion using the KCl concentration data, at points P, a significant difference was observed at the same level of significance, only for 2 samples used in training, 1 for testing and none for model validation, showing that the model can be used for predictive purposes. The standard deviation and standard error values were very low, making the statistical test very rigorous, since the mean values are very close to the times used in the salting process, even those that were not significant at the 5% level.
For the dependent variable (target), variance analysis of the absolute deviations from the values of the respective mean times was performed with the application of Levene's test. For most of the observed cases (Table 2), the values were not significant, indicating that the hypothesis of homogeneous variances should not be rejected, except for the times 0.25 h for the KCl in the static system and 17.5 h for NaCl in the dynamic system.

Conclusions
In this work we studied mozzarella cheese salting by immersion, in aqueous solution with and without agitation, with partial replacement of sodium chloride by potassium salt, using computational tools, such as 3D modeling by the finite element method (FEM) and neural networks of the MLP type, to evaluate the films formation on the cheese surface and to demonstrate its influence on mass transfer and diffusion time. The combination of these tools can be interesting to improve food processing techniques and help industries produce cheese with nutritional advantages.
The sensitivity analysis of the neural network used revealed that the discrete variable Z was the most important in the regression construction. The applied statistical test showed that there is no significant difference in the validation of the predictive model of salting time that was used as the target variable.