Smart Development of Big Data App for Determining the Modelling of Covid-19 Medicinal Compounds Using Deep AI Core Engine System

Smart Big Data App Using Deep Artificial Intelligence (AI) Core Engine System focuses on solving problems related to the difficulty in building a prototyping model computer simulation like in silico as a model initiation of the complex Covid-19 medicinal compound involving Big Data ecosystems such as Hadoop and Spark. The difficulty is for example, currently, it is still very arduous to measure the rate of mixture of a compound when combined with many other compounds, which can consider the trade-off in minimizing the negative effects but optimizing the positive effects. In addition, the computation time becomes very long when using a system that is not in the Big Data ecosystem since this is proportional to the large number of compounds that vary widely and also the number of different situations of Covid-19 patients with other congenital diseases (comorbid) or without congenital disease; thus, it can be considered to be included in a condition that requires a very complex computational process which is very difficult to model using a conventional mathematical approach because the calculations are certainly very complex when compared to approaches using meta-heuristics algorithms such as Particle Swarm Optimization (PSO) which is much easier but requires a very large particle population space and iterative process to achieve global convergence. As a consequence, it requires fast computation processes based on distributed computing such as using the Big Data ecosystem. From the review of these problems, the system was created based on the Computational Intelligence; hence, the end-users, especially the developers, can build an application easier in spite of the complex computations, one of the which is in the meta-heuristic technique to minimize the negative effects and optimize the positive effects of the medicinal compounds by including a lot of data to get a better modeling. Therefore, the prototyping modeling project can be quick and robust, and achieve high performance measurements.


Introduction
Covid-19 that has become a global pandemic is a new type of flu virus and is still very difficult to control because the vaccine as a cure has not been discovered [1]. In addition, the number of suspects and positive cases continues to increase since the process of spreading the Corona virus (Covid-19) is very fast and this virus can last quite a long time. Therefore, solutions are required from various parties, ranging from institutions in government, all elements of people, medical scientists to carry out clinical trials, to the scientists from the field of informatics to create management optimization, such as lockdown management in the form of insulation, semi-insulation, and social distance with the hope that this case will be resolved immediately and the extraction and drug formulation can be found as soon as possible. In the field of informatics, it is in line with technological developments and the development of Big Data applications in the form of a blend between conventional coding using the native language of Big Data, which is Java, and the coding already using high-level programming languages, for example hadoop streaming, where Hadoop initially can only be run through batch processing, then it can be run through streaming using python language, or pyspark streaming, all of which lead and / or are serverless based, which is highly propitious to produce fast and reliable applications in terms of results for solving cases in general [2]. In particular, the case resolved in this study was to determine the modeling of the Covid-19 medicinal compound. In this modeling, the focus was on optimizing the percentage of substances in the candidate of medicinal ingredients using meta-heuristic techniques, the mathematical computation of which is quite complicated and onerous. These ingredients are taken from herbal and non-herbal types. In the previous research, the prediction of active compounds in herbs or herbal medicine was carried out using the Support Vector Machine (SVM) from Naodesheng prescription ingredients. The results suggested that SVM outperformed the other 3 methods, but failed to optimize the formulation for the percentage of the recipe mixture to optimize its efficacy [3]. Another research developed a classification system for herbal properties based on the composition using a Support Vector Machine (SVM) and applied the k-means clustering algorithm as the feature selection method. The managed data cluster succeeded in reducing the data dimensions to 3047 samples of herbal medicine and 236 plants and plant species as its features. The results of SVM classification with feature selection resulted in an accuracy of 71.5% [4]. The subsequent research on Gorontalo medicinal plants also just focused on identifying the image of the plants. The study utilized digital image processing by segmentation and extraction techniques with artificial neural network algorithms and k-means clustering with not so big data resulting in not too optimal evaluation [5]. Moreover, the following research, which was aimed at the search for the discovery of new drug candidates in the database of ancient herbal compounds by unsupervised pattern discovery algorithms. The study is very similar to our study. In that study, however, there was no experimental process for optimizing the dosage content of the concentration or the percentage of the compound mixture. The data on herbal compounds used from the rheum database were 150 recipes in ancient herbal documents. There were combinations of 255 herbal compounds to cure rheumatism [6].
Then, the problem that often arises when developing Big Data applications is the readiness of the infrastructure from the hardware as an adequate server, or the non-physical element in the form of a server in the cloud, both of which are related to the backend and frontend development, which by default are usually in the form of a console. Therefore, it is very difficult for developers to create a visual application in the form of both a web and a mobile interface as the frontend. As a consequence, in this study, a prototype of a liaison between the frontend and the backend was created to solve the problem in determining the Covid-19 medicinal compound, where the backend is the default tool from Apache that by default employs WebHDFS and Spark API with installed algorithm-based Deep AI Core Engine System in the intelligent systems as the solution to very complex modeling conditions, both of which can be accessed and run via a console in the terminal or on the Web or Mobile Interface [7][8][9] [10]. After that, we created an easier solution by using the Django framework which acts as both the frontend and the backend of the Web App. Meanwhile, for the Mobile App, it only acts as the frontend and the backend is also under the auspices of the Django framework environment using the RESTful API that is processed via spark streaming and non-streaming.

Covid-19 and Herbal Medicinal Candidates
Novel coronavirus (2019-nCov) or Covid-19 is a new virus that induces respiratory disease. This virus, which is originated in China, is a family with the virus that causes SARS and Mers the vaccine of which until now has not been discovered [1] [11]. Meanwhile, herbal medicine or with the term "Jamu", Standardization of herbal medicine and Fitofarmaka are the 3 kinds of medicine that made from natural ingredients in Indonesia such as roots, leaves, fruits and animals [4][12] [13]. A number of Indonesian researchers are attempting to find a drug as an antidote for Covid-19, starting from the spice (emponempon) plants by Prof. Nidom from Airlangga University (UNAIR) to propolis by Mr. Muhammad Sahlan from the University of Indonesia (UI), and fruits and other ingredients from the nature [14]. Based Fig. 1, the clinical symptoms of Covid-19 include fever, cough, runny nose, respiratory problems, sore throat, fatigue and lethargy. The prevention of the Corona virus can be done in the following ways called GERMAS, which refers to the movement for people to live a healthy life [1]: • Wash your hands frequently with soap.
• Always wear a mask, especially when you have a cough or cold.
• Consume balanced nutrition and eat more vegetables and fruit.
• Be careful in having contact with animals.
• Exercise regularly and get enough rest.
• Immediately go to a health facility when having a cough, runny nose and shortness of breath.
Covid-19 has symptoms similar to other diseases such as flu, pneumonia and allergies. Details of the set of clinical symptoms are fever, dry cough, cough with phlegm, lethargy, shortness of breath, joint aches / pains, headache, sneezing, runny nose, nasal congestion, watery eyes, sore throat, and diarrhea. Meanwhile, as a description of each of these symptoms is severe, often, sometimes, rarely, and none. At the end of January 2020, Chinese scientists published the protease structure of the Covid-19 virus, which is the part that Covid-19 uses to attach to the host cells. The publication includes a synthetic molecule, called N3, which can act as an inhibitor. From this publication, Sahlan and his colleagues immediately tested to model them with several chemical compounds taken from propolis, which is considered capable of attaching to the protease structure of the Covid-19 virus, so that they cannot attach to human cells. Sahlan revealed that in order to get to the preclinical testing (animal trials) and clinical trials (human trials), he required a sample of the Covid-19 virus. Meanwhile, Prof. Chairul Anwar Nidom mentioned a different approach. He advised the public to consume spices (empon-empon) to prevent and protect themselves from Covid-19. He emphasized that when the bird flu virus infects the lung cells, it triggers an immune response or what is referred to as cytokines. Cytokines in the lungs not only fight viruses, but also cause lung cells to become damaged. In conclusion, viral infection causes a cytokine storm in the lungs. He observed that in the case of a Covid-19 patient who had severe pneumonia, the immune response was almost the same as the immune response driven by the bird flu virus. He hypothesizes that the immune response to the Covid-19 is much lighter than that against the bird flu. Therefore, people had better use spices (empon-empon) in fortifying themselves against the Covid-19 virus.

Map Reduce on Big Data Ecosystem
MapReduce (MR) is a technique for processing very large data that is parallel; hence, it is useful to perform computation and analysis of large-scale data by utilizing multiple machines in the cluster that involves many computers. There are two main processes for MapReduce, which are map process (which consists of splits and mapping) and reduce (which includes shuffling and reducing). The map process is carried out by the Master node to divide / partition the input data into smaller sub-problems and distribute the sub-problems to the worker or slave node. Meanwhile, the reduce process is a master node that takes answers / results from sub-problems. The reduce process is employed to combine all processes (reduction) which will become the final output. An illustration of the entire MR process or pipeline can be observed in Fig. 2 [15][16] [17].

Figure 2. MR Pipeline
Based on the concepts and illustrations in Fig. 2, then MR can also perform computational calculations from the simple to complex ones. Fig. 3 is an example of simple computation on MR, which is calculating the average value of a certain dataset of numbers, for example 6 values. The initial process is from the Master computer which distributes the numerical value of the initial dataset to every Slave computer (slave node) by providing the key to each numerical value. Thus, each slave computer will receive the results of data split or division respectively as K % #Partition, where K refers to the number of data and # Partition represents the number of from slave computers (worker) that exist.

Deep Artificial Intelligence (AI) Core Engine System
Core Engine System is built as a proposal and contribution to this research by utilizing the combination of technologies ranging from the extraction to read the data, then performance of the computation process, to the process of transferring the results. In addition, it can be as core backend engine that can be run through with or without the Web and Mobile applications as frontend. The following shows the details of the workflow stages of an algorithm-based Core Engine System on Computational Intelligence (CI) which have been modified for development from the previous research [18] [19], by adding metaheuristic algorithms for optimization [20]

Setting up a Web App and Mobile App interface (Frontend) or other Interface with or without
RESTfull API. ]. The value of the severity level will also be set, for example, in the interval of 0 to 1. The implementation is, for example, from field observations, the rating for the condition of patient 1 is [0.35, 0.49, 0.52, 0.33, 0.88, 0.8, 0.5, 0.0, 0.55], where the closer the value to zero, the less the severity level is, meaning that the patient has been declared recovered (this applies to the whole 9 columns of the condition of the patients), and the vice versa, the closer the value to 1, the worse the condition of the patients is. Next, the first step for the computation process is to use an Artificial Neural Network (ANN) based on "Deep AI based ELM which involves Training process and Testing process" in order to obtain the optimal weight model of the relationship between the medicinal dose input given (X) and the output effects on health condition of Covid-19 patients (Y), where X is the input layer and Y is the output layer. After that, the second step is to use the model from the Deep AI based ELM results in step 1 as the input model to optimize the element values in the output layer so that the entire condition element values in the patient can be close to zero (meaning that the severity level is no longer available) which is in accordance with the expected results under the supervision of doctors and health workers. In this second step, the algorithm used is based on "PSO" and "Deep AI based ELM which only involves the Testing process". Therefore, the final result, later, will be in the form of a prototype of the optimal dose value as a recommendation for the mixture of compounds as candidates for the Covid-19 medicine from the existing herbal or nonherbal ingredients (if non-herbal ingredients are really used) and also in the end as the treatment recommendation based on the general dose for all patient conditions. And the link for our full code project about computer simulation like in silico to determining the modelling of Covid-19 Medicinal Compounds based optimizing the dosage content of the concentration or the percentage of the compound mixture for demos is available in our webpage: https://github.com/imamcs19/Big-Data-App-in-Silicoto-Determine-Modelling-Covid-19-Medicinal-by-Optimizing-Dose-of-Compound-Mix

Results and Discussion
The dataset used in this study based row was 6 manyfold treatment simulation and each treatment consist 30 instance simulation as well as a dataset of the Covid-19 patients. Then, the data were based column partitioned into 2 parts, which were the ingredients data would be entered as the input layer, while the symptom data that arose in the patient would be entered into the output layer or target which was then processed by machine learning using Deep AI based ELM algorithms. After that, the next step was the optimization process where the features or substances contained in each of the ingredients was extracted, then given an interval of the percentage of the dose used, and randomly generated to carry out the metaheuristic process using the PSO algorithms. The focus of this research is to perform simulations to determine the combination model between natural and non-natural mixtures (like Herb) with Non-Herb compounds (such as synthetic mixtures or such as synthetic drugs) that contain certain compounds, as a Covid-19 drug candidate based on such as the in silico test, which uses computer simulations, but doesn't focus on finding new drug compounds, but more focused on finding modeling and optimization that similar to previous research about drug formulations [21], by combining natural and non-natural mixtures and / or synthetic mixture by determining the optimal percentage of dosage for the patient, then observed how the effect magnitude on the patient's condition, using PSO and ANN based ELM algorithm. The compound has a fixed ratio factor, while the mixture is a non-fixed ratio. In this study, the non-fixed ratio was represented by a dose weighting factor for each mixtures involved in the preparation of the drug formulations as particle value of PSO algorithm. For instance of the mixtures is like Seawater, Crude oil, avocado juice, Avocado Milkshake, Coffee milk, etc. But compound is like Water (H2O), Sodium Chloride (NaCl), Baking Soda (NaHCO3), etc [22].
Then the first step to implement the simulation is by learning from dummy dataset to find out the model of the relationship between the mixture of herbs and / or with non-herbs toward the patient's condition, then the final step is to optimize the weighting factor of several natural mixtures and to find the optimal mixture. From the convergence test results in Fig. 6 which is obtained in the optimization process, provides a big opportunity regarding the solution that no matter how complex and difficult the relationship between the patient's condition (Y) toward the formulation of mixture of herbal and nonherbal medicines that given (X), in the computer simulation shows that it's still easy to get optimal modelling with the PSO and ANN based ELM. Therefore, the final results of this study in the form of recommendations for the optimal composition of drug formulations have great potential to further developed, for example to be used as initial test materials for Covid-19 drug candidates, namely in animal (in vitro), before being tested clinically (clinical trial) in humans (in vivo) [23][24] [25]. So, based on the results of the movement of the evaluation values using the fitness formula, it could be identified that the process from training to optimization of the types of ingredients showed a very significant performance. This can be proven from Fig. 6 which shows that the evaluation value has reached the ideal condition, i.e. in global convergence condition. Medicine that has Reached a Convergent (Stable) Condition.

Conclusion
Convergence testing from several trials has shown that the greater the iteration used, the more the fitness rate is. Therefore, this reveals that the Deep AI based ELM and PSO algorithms are very good to use as the determination or modeling approach in specifying the dose of compounds from drug candidates for many diseases, especially in this study the focus of which was on Covid-19 disease. In the future, it is expected that this research can be applied to be tested on animals first to also model how the effect will be by using the Deep PSO algorithm before being applied to humans. The reason is this research is limited only as a prototype making to obtain a modeling between the dose of herbal and non-herbal ingredients (X) using meta-heuristic technique the computation of which is very easy to solve in complex cases of the effects on the patients' condition later (Y). Future research is expected to develop modeling from X → Y into parameters that consider a kind of time series such as {X (time), Y (time)} → Y (time + 1), {X (time + 1), Y ( time + 1)} → Y (time + 2) and so on, where the addition of time can be based on the intensity of the visiting to the patient, can be hourly (+1) or flexible.