Machine learning assisted improved desalination pilot system design and experimentation for the circular economy

production against hot water temperature (HWT) varying from 38 to 70 ◦ C, and feed water temperature (FWT) is changed from 34 to 42 ◦ C. Whereas, the feed flow rate (FFR) is investigated to be varied nearly from 3.6 to 8.7 LPM in the three stages, i.e., FFR-S1, FFR-S2 and FFR-S3. The compiled dataset is used to make the process models of the MED system by three ML-based algorithms, i.e., Artificial Neural Network (ANN), Support Vector Machine (SVM), and Gaussian Process Regression (GPR) under rigorous hyperparameters optimization. GPR exhibited superior predictive performance than those of ANN and SVM on R 2 value of 0.99 and RMSE of 0.026 LPM. Monte Carlo technique-based variable significance analysis revealed that the HWT has the highest effect on distillate production with a percentage significance of 95.6 %. Then Genetic Algorithm is used to maximize the distillate production with the GPR model embedded in the optimization problem. The GPR-GA driven maximum distillate production is estimated on HWT = 70 ± 0.5 ◦ C, FWT = 40 ± 2.5 ◦ C, FFR-S1 = 6 ± 2.6 LPM, FFR-S2 = 7 ± 1 LPM and FFR-S3 = 7 ± 1. The ML-GA-based system analysis and optimization of the MED system can boost the distillate production that promotes operation excellence and circular economy from the desalination sector.


Introduction
Clean drinking water is among the formidable challenges because over 2.2 billion people do not have its availability [1].The situation is exacerbated with surging demand which is expected to surpass 6000 billion cubic meters by 2030 from 4200 billion cubic meters in 2015 driven by rising population, industrial development, and life quality standards [2].To deal with the water scarcity problem, desalination has emerged as the only practical and feasible solution [3].They also contribute towards the water circular economy.Therefore, different desalination systems are working worldwide, majorly including Multistage Flash (MSF), Reverse Osmosis (RO), Multi-effect Desalination (MED), and others (electrodialysis, nanofiltration, ultrafiltration, membrane distillation, etc.) [4].Meanwhile, a surge in the installed capacity is observed and is predicted to grow further as shown in Fig. 1 [5,6].It is also worth mentioning that thermal-based systems have grown significantly over the last decades due to low pretreatment needs, robust design, low maintenance, and versatile feed handling ability [7].Among thermal-based systems, multi-effect desalination (MED) has shown promising characteristics including better heat transfer, high working efficiency, capability to harness low-grade energy, and hybridization flexibility with other systems [8] resulting in a wide range of installations worldwide as a standalone as well as hybrid systems [9].However, these systems have high energy consumption, operating costs, and CO 2 emissions compared to membrane-based systems [10].These perform only within 10 to 13 % of their thermodynamic limit and hence are the subject of extensive research for improved performance [11].
Besides the above-mentioned established technologies, significant research interest has been observed in biomass-based adsorbent materials for water treatment [12,13].These can be developed from agricultural residues [14], natural fibres [15], wood cellulosic [16], composite material [17] and activated carbons [18].Hossain et al. [19] proposed a cost-effective adsorbent material for effective adsorption and recovery of rare earth metals Yb (III).In another study [20] they developed an eco-friendly composite adsorbent via an eco-friendly process for successful detection and removing Cd (II).Similarly, they also proposed ligand-immobilized nano-composite adsorbent for efficient cerium(III) detection and recovery [21].They showed that the proposed adsorbent was also highly selective to Ce (III) with fast kinetic performances Su et al. [22] proposed a microbubble floating-extraction mechanism which has the separation efficiency of molybdenum over 99 %.Awual [23] developed novel nanocomposite materials that were prepared for optical Hg (II) detection/removal and showed that the proposed materials are promising candidates for in situ environmental remediation.Similarly, they developed ligand-based facial conjugate materials for treating copper (II) [24].They showed that the conjugate materials exhibited high sensitivity and selectivity to Cu(II) ions.
Desalination plays a critical role in the water circular economy to minimize water waste and promote the responsible use and reuse of water resources.Desalination technologies can contribute to this by addressing water scarcity and enhancing water resource management in several ways.The earlier research on MED desalination systems includes experimental and theoretical studies [25].The experimental studies included the development of the system, hybridization with other systems, and performance investigation [26].The theoretical developments include numerical modelling, optimization, parametric analysis, and retrofitting of different components for performance improvement [27,28].These conventional mathematical models require extensive knowledge, complex equation development, and calculation of associated properties [29].Particularly, under unpredictable demands, and fluctuating energy resources like renewable energy, cogeneration plants, and hybrid systems, the designing is difficult [30].Also, the conventional models use assumptions that compromise the accuracy of the predicted characteristics [31].Furthermore, the development and solution of the mathematical model for performance prediction and optimization is computationally expensive.Owing to the limitations of the conventional methods and the superiority of machine learning models to approximate the complex function space with high accuracy, reduced computational resource requirement, and flexibility to update the trained model, a growing trend of employing machine learning for water desalination systems is observed and plenty of research is conducted as summarized below [32].

Desalination and machine learning
The application of machine learning (ML) on desalination systems is obviously to benefit from technological advancements and accurate data-driven methods [33].ML tools can be applied for the system modelling and analysis of the MED system for informed decision-making and operational strategy development for the desalination systems [34].Plenty of work has been conducted in this regard deploying the ML tools to model different performance indicators of the desalination systems.For instance, Son et al. [35] proposed the utilization of a convolutional neural network in conjunction with a long short-term memory ML structure for pH prediction in water treatment systems.They showed that the proposed framework demonstrated better accuracy with R 2 ≥ 0.998 than the conventional numerical models with R 2 < 0.309.Bonny et al. [36] developed a deep reinforcement learning-based method to optimize the pressure across a trans-membrane reverse osmosis system.They reported improvement in permeate and salt rejection with 99 %.Shahane et al. [37] optimized the evaporator design for higher water production, better heat transfer, and lower scale formation using an artificial intelligence-based non-dominated sorting genetic algorithm.They showed that the proposed model achieved optimal results with an average relative error of <3.24 % and can be used for further improvement.Krzywanski et al. [38] identified the optimal operating conditions to achieve water uptake of 1.65 g/g for a fluidized bed adsorption cooling and desalination system using artificial intelligence.Similarly, He et al. [39] reported a 10 % improvement in freshwater productivity by using AI-based optimization of renewable energy-driven desalination systems.
Karambasti et al. [40] optimized a hybrid Stirling engine integrated with a MED system using an analytical hierarchy process that increased energy and water productivity thus reducing the operating costs of the system.They showed that the optimal system can deliver 2.58 kW of power, 19.92 m 3 /day of water at a rate of 0.29 $/kW-hr, and 1.6 $/m 3 , respectively.Similarly, Pombo et al. [41] designed a small, modular nuclear fission chamber to be used as a hybrid plant; solar energy and wind energy were used to achieve an optimum hybrid plant that improved both safety and energy efficiency.Salem et al. [42] optimized a hybrid solar still and humidification-dehumidification system using a multilayer perceptron.They showed the superiority of the proposed model to predict and enhance system performance in less time and resources.Shakibi et al. [43] developed and optimized a cogeneration plant with a gas turbine cycle combined with a field of heliostats and thermal vapor compression MED system using a multi-objective grey wolf, support vector regression, and grasshopper optimization algorithms.The study reported an exergy efficiency of up to 45.6 %.
The above-mentioned literature review suggests that the use of ML tools in the water sector improved system performance, process control, and effective resource planning in fluctuating situations.However, MLbased process design and performance improvement of the MED systems for maximum distillate production are rarely conducted in the literature potentially because of the fabrication challenges of the MED system and the unavailability of the design and process data.More importantly, a comprehensive analytical framework consisting of a step-by-step approach for carrying out the ML-assisted analysis and optimization Fig. 1.Global desalination trend and prediction [5,6].
W.M. Ashraf et al. for enhancing MED system performance is missing in the literature which is a major obstacle to the inclusion of ML in the desalination sector.The verification of the ML-based estimated solution may contribute to the knowledge pool of applied ML in the desalination sector demonstrating its effectiveness and usefulness for the performance enhancement of desalination systems.
Thus, the current study provides an ML-based comprehensive analytical framework deploying the advanced modelling algorithm and optimization technique for maximum water production from the MED system.For this task, an extensive experimental investigation of an inhouse MED system is conducted first on different operational conditions and the dataset is compiled.ML models like artificial neural network (ANN), support vector machines (SVM), and Gaussian process regression (GPR) based process models for distillate production are constructed under rigorous hyperparameter optimization to train a wellpredictive model that can approximate the operating behavior of the designed MED system.A sensitivity analysis employing the Monte Carlo method is conducted to get an insight into the variable's significance on the distillate production that is missing in the literature.The modelbased optimization analysis by genetic algorithm is carried out to estimate the optimized operating conditions for the maximum water generation and the results are verified on the experimental set-up thereby demonstrating the major novel aspect of this work.An extensive closedloop data-driven modelling and optimization assessment for the performance enhancement of the designed MED system and distillate production are carried out by the ML approach which is the major novelty of this work and may help the industrial practitioners and engineers to deploy the proposed framework for the operational excellence of their industrial systems to contribute to water scarcity and sustainability challenges.

Materials and methods
The methodology adopted to conduct the current research work is graphically represented in Fig. 2. The first step involves detailed experimentation and data collection at assorted operating conditions of the MED system.Then, the data is visualized and processed to eliminate outliers in the dataset compiled from the experiments performed on the MED system.In the next stage, ML models including ANN, SVM and GPR are developed under extensive hyperparameter tuning.The best predictive model is then used to conduct the significance analysis of the variables and maximize the distillate production under the optimized operating conditions by genetic algorithm.Finally, the investigation of the model-based optimized operating conditions is verified on the experimental system.All these stages are described in detail below.

Experimentation and data collection
Extensive experimentation is conducted using a Multi-Effect Desalination (MED) test rig as indicated in Fig. 3.The rig contains a steam generator, evaporator stages, a condenser, a cooling tower, a brine cooler, collection tanks, pumps, and a control system.The system operates as the hot water is directed to the steam generator in a closed loop.The steam produced by the generator is then supplied to the first evaporator.The saltwater feed is applied on the outer surface of the evaporators using a pump and nozzles.The feed flow rate is controlled using throttling valves according to the demand.In the first evaporator, Fig. 2. A stepwise methodology is adopted in this study to maximize the distillate production from the MED system.The experiments are conducted on the experimental set-up, the collected dataset is compiled and visualized and then used to make process models by ML.The significance order of the input variables of the MED system is established by the Monte Carlo technique.The ML model is embedded in the optimization problem to maximize the distillate production and the problem is solved by genetic algorithm.The estimated results for the maximum distillate production are also verified on the experimental setup.the steam inside the tube transfers heat to the feed water and evaporates a portion of it.The vapours produced in the first evaporator are then directed to the next evaporator tubes and the process goes on.Meanwhile, the steam inside the tubes is condensed and collected as a distillate (freshwater).The brine enters the subsequent effect which produces some additional vapor due to flashing caused by pressure drop between the evaporators.The vapours produced in the final evaporator are sent to the condenser to produce distillate by using a cooling tower.The distillate and brine from all the evaporator stages are collected in the distillate collection tank and brine collection tank respectively using headers.The brine is partially recirculated to mix with feed which improves the system's recovery ratio.Turbine flow meters are utilized to calculate the flow rate inside the system.Meanwhile, the vacuum inside the system is maintained using a vacuum pump to ensure continuous evaporation.
The test rig is completely instrumented to observe and log real-time data.Meanwhile, it is also important to mention that the instruments are vacuum-grade because the system operates under vacuum conditions.Table 1 summarizes the instrumentation details of the system, and an Agilent data logger is used to record the data.

Data-visualization & pre-processing
The MED system was operated under steady-state conditions for diverse operating scenarios to ensure the stable operation of distillate production in each stage [45].It was observed that the values of various thermo-physical operating parameters in the second and third stages are effectively influenced by that of the first stage's operating parameters.Therefore, the important system control parameters are considered from the 1st stage for ML-assisted analysis of the aforementioned system due to two reasons.(1) The 1st stage operating parameters serve as the driver of MED (2) ML models work well on the causal and independent nature of the input variables [46].
The operating parameters taken from the 1st stage of MED are hot water temperature (HWT), and feed water temperature (FWT) since these are independently controlled variables.Moreover, these are the most critical parameters in MED system operation.For instance, a higher hot water temperature means higher input energy and higher productivity.Similarly, a higher feed water temperature reduces the preheating requirements and is of significance when waste heat is used.Similarly, the feed flow rate (FFR) corresponding to the three stages is also included which represents the feed of seawater flow to the stages of MED and are represented as FFR-S1, FFR-S2, and FFR-S3 respectively.The feed flow also impacts productivity; however, it needs to be controlled carefully.This is because a very low feed flow rate results in dry patches in the evaporator and reduces system performance.On the other hand, a very high flow rate results in lower evaporation because of higher sensible water heating.Finally, the output of the MED system is the distillate water production (now called distillate production) that is to be modelled on the selected input variables.
The data thus collected is required to be visualized to investigate the data-distribution space.Violin plot is an efficient approach to visualize the data distribution and the data-distribution curve is also plotted thereby showing the density of the data distribution.The visualization of data distribution is an important step before building the ML models since visualizing the data distribution space on the operating ranges of the variables helps ensure data quality.Once the data visualization step is carried out, the collected data is normalized for the development of ML models.Data normalization is essentially important because it ensures the efficient construction of ML models for the given input variables which may have significantly large operating ranges.Thus, scaling all the variables into equal ranges, depending upon the normalization technique, provides fair chances for developing their association with the output variable which could otherwise be biased towards some particular input variable(s).Among the various data-normalization techniques reported in the literature [47] min-max scaler normalization technique has shown promising results in model development [48].The theoretical description of the minima and maxima normalization method is given as: Here, u, u min and u max are the actual, minimum, and maximum values of the variable present in the dataset.u ʹ is the normalized value corresponding to u and is mapped in the [0,1] range.Subsequently, the normalized training dataset thus obtained is then used for ML modelling.

ML modelling algorithms
Three state-of-the-art ML models, i.e., ANN, SVM, and GPR are employed for development of the data-driven models for MED system due to their excellent modelling capabilities and suitability for the system under consideration.ANN is an advanced and efficient datamodelling tool that mimics the functioning of the human brain.Multilayer perceptron with backpropagation, one of the commonly used network architectures among the scientific community, is proven to be efficient enough for developing engineering solutions for complex and large-scale industrial systems [49].ANN has fast and effective learning ability in addition to digging the non-linearities present in the large volumes of the data [50].Moreover, backpropagation is a supervised technique and can develop a generalized model for the given system provided optimized network architecture is constructed [51].
SVM being the supervised learning technique has demonstrated excellent performance in modelling complex systems [52,53].The normalized training data undergoes space transformation to fit the hyperplane within the decision boundaries and subsequently, the interactions among the input-output variables are captured by Karush-Kuhn-Tucker (KKT) statement [54].Therefore, SVM has good generalization capability and high prediction accuracy.One of the advanced and competitive features of SVM over ANN is that its computational complexity is independent of input space dimensions as well as it cannot be trapped in the local extreme [55].
GPR is also one of the powerful modelling algorithms of ML and can approximate the complex function underlying the dataset with high accuracy.GPR represents a probabilistic model that relies on nonparametric kernels and is useful for interpolation tasks for the high dimensional input space [56].A Gaussian process (GP) denotes a group of random variables, and the finite set of random variables has a joint Gaussian distribution.In a GPR model, the response is elucidated by incorporating latent variables, represented as f(x i ) for i = 1, 2, …, N, derived from a gaussian process, alongside explicit basis functions.The covariance function built on the hidden variables ccapturesthe level of smoothness in the output variable while the basis functions project the

Table 1
Instrumentation details for MED system [44].input variables "x" into a feature space dimension.The "explorationexploitation" strategy is implemented, and its trade-off is controlled while modelling the underlying function from the training dataset the trained GPR can then effectively predict the response against the new input observations.

Evaluation criteria
Two commonly used statistical measures, i.e., coefficient of determination (R 2 ) and root mean square error (RMSE) are selected to measure the performance of the ML models during the model development phase.Their mathematical expressions are defined as: In this context, y i represents the observed or actual values of the output variable and ŷi represents the predicted values of the output variable generated by the model and i = 1,2,3, …, N equal to the entire count of data points or instances within the dataset.Whereas, y i and ŷi are the mean of actual and model-predicted values.R 2 measures the accuracy of the ML model in predictive analytics and it varies from 0 (poor accuracy) to 1 (excellent accuracy).While RMSE computes the error between the true value and model-simulated response and is to be minimized.

Variables significance analysis
The ML models are the functional map between the input-output variables shaped by the data associated with the variables.In the next step, it is important to investigate how changing one variable in its operating range impacts the output variables meaning the significance of the input variable towards the output variable can be studied.The importance of the input variable about the output variable in the constructed functional map should be studied as it helps to recognize the sensitivity of the output variable towards the change in the input variable.
The Monte Carlo technique is a global sensitivity analysis method to analyze the impact of the input variable on the output variable.In the Monte Carlo technique, a large number of experiments are constructed and simulated by the developed ML model in a manner that the impact of the input variable on the output variable can be investigated comprehensively.Resultantly, the percentage significance order of the input variables towards the output variables can be established.The details associated with the Monte Carlo technique-based experimental design, simulation from the ML model, and establishment of the variables' significance order can be studied from [55,57].

Estimating optimized operating conditions for maximizing the distillate production from the MED system
A genetic algorithm is a metaheuristic and nature-inspired global optimization algorithm that can produce quality solutions in a reasonable timeframe and with computational resources [58].The genetic algorithm is composed of a collection of population sets known as individuals, with each individual defined by a mathematical eq.A random initial population is generated, and its fitness value is evaluated.Subsequently, the individuals undergo modification through mutation and crossover operations, producing offspring.The fitness value of each offspring is recorded, and the most favourable one is selected for the subsequent generation [59].This process continues until either the maximum number of generations is reached, during which the input data is modified, or the convergence criteria are met, indicating optimal results for the optimization.Thus, the optimized operating conditions of the input variables can be estimated thereby ensuring the maximum distillate production from the MED system under consideration [60].

Investigation of the optimized operating conditions
In the last step, the optimized operating conditions for the distillate production from the MED system are investigated on the experimental set-up to check the effectiveness of the model-based analysis for the maximum distillate production.A good agreement between the true distillate production and the model-based estimated distillate production on the determined operating conditions would validate the effectiveness of the ML-based modelling and optimization analysis demonstrating the confidence to desalination community to apply the ML-based analytics for the desalination systems.

Visualizing the data-distribution space of input-output variables
The experiments are constructed on the wide operating ranges of the input variables taken from the commissioned MED system and are performed on the experimental set-up.The mean, median, and mode values of the distillate production are calculated given the fluctuation in the distillate production corresponding to each experiment.The whole set of experiments are carried out on the commissioned MED system and the distillate production dataset is compiled.
Fig. 4 presents the data-distribution profiles of the input variables hot water temperature (HWT -• C), feed water temperature (FWT -• C), feed flow rate in stage − 1 (FFR-S1, LPM), feed flow rate in stage − 2 (FFR-S2 -LPM), and feed flow rate in stage − 3 (FFR-S3 -LPM) as well as the output variable (distillate production) of the MED system.HWT is varied from 38 • C to 70 • C which is the significantly wide operating range to be investigated for the distillate production from the MED system.HWT provides the thermal energy input to the MED system for the evaporation of the seawater that is condensed to make distillate production.Thus, operating the MED system on the large operating space of the HWT allows us to investigate the performance of the system for the distillation production.Another important input variable of the MED system is the FWT which is the temperature of seawater and is generally maintained around 34 • C to 42 • C with the mean value of 37 • C. The FWT also influences the energy consumption profile of the MED system since the heat duty of the system is dependent on the initial temperature of the seawater being sprayed in the stages of the MED system.The FFR-S1 is varied from 3.6 LPM to 8.7 LPM in the first stage of the designed MED system whereas FFR-S2 and FFR-S3 are maintained around 5.8 LPM to 8.1 LPM.The asymmetric data-distribution profile is observed for the feed flow rates in the stages of the MED system and the mean value is computed to be 7.6 LPM, 6.8 LPM and 7.0 LPM for FFR-S1, FFR-S2 and FFR-S3 respectively.Good data distribution profiles of the experimental data spread over the operating ranges of the variables associated with the MED system is observed.The initial quality of the data collected from the experimental set up of the MED system is ensured since outliers are not present and the data distribution is continuous on the operating ranges of the variables.Thus, the dataset can be used to train the ML based process models to approximate the distillate production behavior from the MED system.

Development of ANN, SVM, and GPR-based process models for the MED system
Three modelling algorithms of ML including ANN, SVM and GPR are trained to predict the distillate production on the operating conditions of the MED system.The work regarding ML model development and simulation is carried out in MATLAB 2021b.The hyperparameters associated with the ML models are to be rigorously and extensively tuned to achieve the good prediction performance of the models.For ANN, the hyperparameters are explored in the following ranges/categories: number of fully connected layers: 1-3, activation function: ReLU, tangent hyperbolic and tangent sigmoidal, regularization strength: profiles are also plotted along the edges of Fig. 5.Both ANN and GPR model-based predictions seem to approximate the true data-distribution profile of distillate production.However, comparing the predictive performance of the three models, it is evident that GPR has developed superior functional mapping to predict distillate production with the relatively lowest predictive error in comparison with those of ANN and SVM.Moreover, the GPR based predictions lie close to the 95 % confidence interval than those of ANN and SVM.Therefore, GPR is selected for performing the further analysis of interest as demonstrated in the following sections.

Monte Carlo technique-based variables significance analysis
Monte Carlo technique-based variable significance analysis explores the functional ranges of the input variables and constructs the simulated experiments that are simulated from the developed ML model.Usually, a large number of simulated experiments with different possible operating conditions of the input parameters are constructed to investigate the system's response.In this work, we have taken 10 step sizes for the input variable whose significance is to be evaluated and 1000 randomly generated observations within the operating ranges of the other input variables are generated.Thus, 1000 simulated experiments with a constant value of the input variable, whose significance is to be investigated, are constructed and the procedure is repeated unless the complete operating range of the input variable is explored.The simulated experiments are predicted from the developed GPR model and the process is replicated for each of the input variables to investigate the variable's significance.
The parametric effect of the input variables on the distillate production is normalized, and the percentage significance of the input variables towards the distillate production is calculated which is presented in Fig. 6.It is found that hot water temperature (HWT) is the most substantial factor towards distillate production with a significance value of 95.6 %.It is reasoned that as the hot water temperature increases, the energy input to the system increases.This boosts the heat transfer in the evaporator resulting in higher vapor generation and resultantly the distillate production.Feed water temperature (FWT) is the second most significant variable towards distillate production with a significance value of 2 %.This is because at higher feed temperatures, the sensible energy demand for preheating feed in the evaporator is lower.Therefore, hot water energy is utilized for evaporation which increases distillate production.Similarly, the percentage significance value of feed flow rate-S1 (FFR-S1), feed flow rate-S2 (FFR-S2), and feed flow rate-S3 (FFR-S3) is as follows: 2 %, 0.3 %, and 0.1 % respectively.It can be explained considering the domain knowledge of the MED system that a higher feed flow rate requires higher sensible energy input to achieve the evaporation temperature and subsequent vapor production.Meanwhile, it is also important to mention that a lower feed flow rate results in dry patches that diminishes the distillate production.Therefore, the required quantity of feed water flow rate should be maintained.

Maximizing the distillate production by genetic algorithm
In this work, the GPR model exhibited good performance to model the distillate production from the MED system based on the identified input variables.The MED system is operated in the lab on the design space of the input variables and the distillate production dataset is compiled corresponding to the experimental conditions.It is imperative to investigate the MED system by advanced optimization techniques to estimate the optimized operating conditions of the input variables such that distillate production is maximized.Thus, the optimization problem is formulated that takes the distillate production as the objective function and the design space of the input variables is incorporated to explore it for the determination of the optimized operating conditions to obtain the maximum distillate production.The optimization problem for maximizing the distillate production from the MED system is written as:   water temperature is 70 ± 0.5 • C and feed water temperature is 40 ± 2.5 • C corresponding to distillate production of 1.042 LPMthe maximum achievable distillate production from the MED system.Similarly, the optimized operating range for feed flow rate-S1, feed flow rate-S2, and feed flow rate-S3 for maximum distillate production (1.042 LPM) is as follows: 6 ± 2.6 LPM, 7 ± 1 LPM, and 7 ± 1 LPM respectively.Overall, the higher hot water temperature increases the energy input to the system which boosts the heat transfer in the evaporator resulting in higher vapor production and distillate.Meanwhile, higher feed water temperature also increases distillate production because the higher feed temperature results in lower sensible energy demand for preheating feed in the evaporator.Therefore, hot water energy is utilized for evaporation which increases water production.However, the feedwater temperature can only be increased by recuperating energy from the rejected brine, distillate stream, or auxiliary energy source.

Investigation of the GA-driven optimized solution on the MED set-up
The ML-based studies conducted in the domain of desalination systems focus on modelling its performance indicators and subsequently, the trained models are deployed for predictive analytics.A closed-loop ML-based modelling and optimization framework that can be deployed for estimating the effective solutions for the desalination systems, especially for MED systems is potentially missing in the literature and is an obstacle to the inclusion of ML in the domain of desalination systems.The experimental validation of the model-based optimized solution to obtain the maximum distillate production from the MED system is carried out to evaluate the accuracy of the estimated solution on the experimental setup.
The MED system is operated on the optimized operating range of the input variables and the distillate production dataset is compiled.The important input parameters including HWT, FWT, and FFR-S1, FFR-S2 and FFR-S3 are maintained carefully in the optimized range.This is because the appropriate selection of these parameters increases distillate production.The distribution of the distillate production against the optimized operating ranges of the input variables is presented on Fig. 8.The distillate production is varied from 0.04 LPM to 1.98 LPM during the experimentation while the average distillate water production remained around 0.98 LPM during the experimentation.The fluctuation in the distillate production can be explained by the sensitivity of the sensors and flashing (due to pressure drop) of the distillation production.Therefore, the average value of the distillate production is computed to estimate the distillate production during the experimentation.
Comparing the true distillate production with the GPR model-based maximum distillate production estimated by GA, i.e., 1.042 LPM, it is apparent that the model-based optimized solution for the maximum distillate production is quite close to the experimental observation.The experimental verification of the ML-based optimized solution demonstrates the effectiveness of the presented ML-based modelling and optimization framework for the maximum distillate production from the MED system.Thus, the data-driven ML approach can be applied for the improved operation of distillate production thereby contributing to operation excellence of the MED system.

Conclusion
Desalination technologies hold the promise to meet the water demand for different applications on the face of water scarcity and sustainability issues.The improved process design and operation of the MED system can enhance distillate production boosting the performance of the installed desalination technologies worldwide.In this work, the analytical framework leveraging the power of machine learning and rigor of optimization technique is proposed for the improved design of the MED system to support the circular economy.
• Extensive experimentation is carried out on the in-house built MED system to collect the dataset which is deployed for constructing the ML-based process models.ANN, SVM, and GPR-based process models are trained, under rigorous hyperparameter tuning, to predict the distillate production from the MED system.GPR is turned out to be a superior algorithm having R 2 of 0.99 and RMSE of 0.026 LPM.• Monte Carlo technique-based variables significance analysis reveals that hot water temperature is the most significant input variable towards the distillate production with a percentage significance value of 95.6 followed by feed water temperature having a percentage significance of 2.   between the true distillate production (0.98 LPM) and estimated distillate production (1.042 LPM) is found.• The closed-loop investigation of model-based optimization results on the MED system developed by the proposed analytical framework demonstrates the effectiveness of ML for the improved process design, performance enhancement and operation excellence of the MED system that contributes to the circular economy and digitalization of the desalination systems.

Future work
In the future, a comprehensive study on different desalination technologies will be conducted following the ML-based modelling and optimization framework.The multi-objective optimization problem will be investigated for the higher energy efficiency and operation excellence of the desalination systems.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
3.03 × 10 − 7 -3033.3,and layer size: 1-300.Whereas, several hyperparameters are tuned for the SVM model in the operating ranges given as: box constraint: 0.001-1000, epsilon: 0.00028725-28.725and kernel function: Gaussian, Linear, Quadratic and Cubic.Whereas, for GPR, the following hyperparameters are optimized within their design space which is established as follows: sigma: 0.0001-2.6159,basis function: constant, zero and linear, kernel scale: 0.033639-33.639and kernel function: Nonisotropic Exponential, Nonisotropic Matern 3/2, Nonisotropic Rotational Quadratic, Nonisotropic Squared Exponential, Isotropic Exponential, Isotropic Rotational Quadratic and Isotropic Squared Exponential.The cross-validation technique is a rigorous method to reduce the possibility of overfitting and thus promoting the generalization ability of the model.In this work, we have implemented five-fold cross-validation technique for ML models development.Grid-search method in conjunction with the Bayesian optimization technique and expected improvement per second algorithm are deployed for the hyperparameters tuning associated with the ANN, SVM, and GPR for building the well-predictive models.The grid search method systematically explores the effect of the selected combination of the hyperparameters on the model's predictive performance and explores different combination of the hyperparameters unless the ML model achieves good predictive performance.The optimized values of the hyperparameters obtained for the ANN model are as follows: the size of the first hidden layer = 233, the size of the second hidden layer = 113, activation function = tangent hyperbolic, and the regularization strength = 0.00025.The optimized values of hyperparameters for the SVM model are as follows: kernel function = linear, epsilon = 0.014, and box constraint = 779.37.Whereas the optimized values of hyperparameters for the GPR model are as follows: basis function = zero, kernel function = isentropic exponential, kernel scale = 3.8809, and sigma = 0.6157.Fig. 5 compares the modelling performance of ANN, SVM, and GPR models to predict the distillate production from the MED system.The model-predicted responses are compared with the true values, and the efficacy of the model-predictability is established based on R 2 and RMSE.Referring to Fig. 5(a), a good degree of fit is observed for the ANN model as R 2 of 0.98 and RMSE of 0.037 LPM are computed for the model-based predictions of distillate production.On the other hand, SVM seems to comparatively underperform for the distillate production's modelling task since R 2 of 0.84 and RMSE of 0.103 LPM are computed as shown on Fig. 5(b).Whereas, R 2 of 0.99 and RMSE of 0.026 LPM are computed for the GPR-based predictions for the distillate production.The true data-distribution and model-based data-distribution

Fig. 4 .
Fig. 4. Data-distribution profiles of the input variables: HWT, FWT, FFR-S1, FFR-S2, FFR-S3, and the output variable: Distillate Production taken from the MED system.The asymmetric data distribution with a continuous distribution of the data observations on the operating ranges of the variables is observed.

Subject to: 38 C
≤ HWT ≤ 70C 34 C ≤ FWT ≤ 42C 3.6 LPM ≤ FFR − S1 ≤ 8.7 LPM 6.1 LPM ≤ FFR − S2 ≤ 8.1 LPM 5.8 LPM ≤ FFR − S3 ≤ 7.8 LPM The optimization problem is solved in MATLAB 2021b by genetic algorithm solver and the default values of number of generations, number of population, and other settings are deployed to obtain the solution for the maximum distillate production.Fig. 7 presents the mapping of the optimized values of the hot water temperature and feed water temperature as determined by the GA-based optimization technique on the distillate production profile.The two input variables are deployed for constructing the distillate production profile since they have 97.6 % percentage significance towards it.The mean value of hot

Fig. 5 .
Fig. 5. Development of ML models to predict the distillate production by (a) ANN, (b) SVM, and (c) GPR.The GPR model appears to have comparatively superior modelling performance than those of ANN and SVM.The prediction interval on 95 % confidence interval is also constructed for the model-based predictions.

Fig. 6 .
Fig.6.Significance percentages associated with input variables towards the prediction of distillate production.HWT is the most significant variable towards the distillate production followed by FWT, FFR-S1, FFR-S2, and FFR-S3.

•
The trained GPR model is integrated into the optimization problem for the maximum distillate production and the optimization problem is solved by genetic algorithm.The optimized operating conditions for the maximum distillate production are estimated which are as follows: hot water temperature = 70 ± 0.5 • C, feed water temperature = 40 ± 2.5 • C, feed flow rate-S1 = 6 ± 2.6 LPM, feed flow rate-S2 = 7 ± 1 LPM and feed flow rate-S3 = 7 ± 1.• The optimized operating values of the input variables are investigated on the experimental MED system and a close agreement

Fig. 7 .
Fig. 7.Mapping the genetic algorithm-driven optimal solution for the maximum distillate production on the response curve of distillate production constructed against the two significant input variableshot water temperature and feed water temperature.

Fig. 8 .
Fig. 8. Investigation of model-based optimized solution for maximum distillation production.A close agreement between the GA-estimated and experimental value of distillate production is observed.