Prediction of Reproductive System Affectation in Sprague Dawley Rats by Food Intake Exposed with Fenthion, Using Naïve Bayes Classifier and Genetic Algorithms

Improper application of pesticides in agricultural crops and indirect effects caused by exposure to them through consumption of contaminated crops, nowadays represent a serious risk to public health harmony. It is vital then, to know the degree of toxicity of each of these chemicals in order to properly regulate its application and sensitize the population at risk. Therefore, this paper shows the results of an algorithm with the ability to predict the effects on the reproductive system in Sprague Dawley rats, caused by the intake of food exposed with Fenthion. The original data were processed using the Naïve Bayes classifier, then optimized using genetic algorithms. It is concluded that the prediction algorithm does the job properly, processing qualitative information with relatively low computational cost, which allows its easy portability to different development platforms.

Currently, given the food demand in the global market, the use of pesticides in agriculture has been essential to achieve optimal crop yields. In order to show the impact of pesticides on health, it has been published results from studies that justify how these compounds are applied indiscriminately, showing the effects produced by direct or indirect exposure, 1,2,3,4 . Though, regulators for pesticide application, do not define good pesticide management practices, generating concern in local prevention agencies to adequately train the farmers in order to avoid health consequences 5,6,7 . Additionally, it is common to find patients with clinical pictures of accidental poisonings caused by these products, so it is vital timely care of patients to minimize the risks and consequences on human health in such accidents 8 .
The Fenthion (CAS 55-38-9) is an organophosphate applied to pest controls on agricultural crops, cases of public health and residential pest control. Despite being a compound of fast degradation in the environment, its effects on the reproductive system 9,10 , do not stop being troubling to health agencies to properly regulate their application. Hence, the great importance to run practices for the prevention of poisoning, such as promoting education about good management practices and awareness of health risk for poisoning, such as to convey the importance of properly monitoring the implementation of a pesticide in food 11,12,13 .
*Corresponding author E-mail: The scope of medicine and toxicology as we know it today, would not have had the same impact if not for scientific experimentation on animals. So, it is essential to develop studies which can predict effects from eating contaminated food with pesticides, in each one of the systems of these living beings, since they are considered an essential approach to the study and improvement of the quality of human health and ensure the safety of public health 14,15 .
Some research focused on identifying and predicting effects on health, by the consumption of pesticides 16,17,18 , demonstrate the usefulness of the application of machine learning techniques, such as the execution of Naïve Bayes classifiers and neural networks. Inversely, it is noted that the accuracy of implemented techniques descends slightly to treat a high number of predictor variables.
Based on the above, this study aims to develop an algorithm to predict the effects on the reproductive system, caused by the intake of Fenthion in Sprague Dawley rats by implementing machine learning techniques, optimized under the application of genetic algorithms. This algorithm will have the flexibility to analyze multiple databases, both for the study of health effects in animals like in humans, for several commonly used pesticides or interest in the region. Finally this technique will have the ability to be ported easily to other development platforms that allow easily distributing information to the population in general, thus transmitting the importance of adopting good management practices and to warn of the effects due to direct exposure of pesticides.

METHODS AND MATERIALS
First, data were collected from ToxRefDB database 19 , which presents results of toxicity studies in different in vivo animals, according to the chemical of interest. For this work, the available information of a laboratory test with Sprague Dawley rats was processed, given its recognized skills to be a model organism 20 . In this study it was supplied Fenthion, with a purity of 96.9%, orally, by a period of 10 weeks, to those rodents of both sexes 21 .
The collected information was filtered to obtain the predictor variables and the variable to predict. In the first case was analyzed sex, the applied dose (mg / kg / day) and the generation in which changes were observed in rats. In the opposite case the effect or final alteration in the rodent was filtered, if it existed, during the execution of the experiment (Figure 1).
The available information processing is developed by applying the Naïve Bayes classifier (NB) 22 , taking advantage of the ability to work with qualitative data, pretending that the predictor variables are independent from each other, and their great results in supervised learning applications 23,24 . The original data were classified randomly into two groups: the first devoted to the training phase of the classifier and the second to test it. Under these assumptions and applying Bayes' theorem, calculating probability of an event occurring , given a condition , it is set according to the Ec. 1.
... (1) For this situation, the NB classifier, assimilates the probability (prediction) for a variable (Final alteration), given a set of predictor variables , y (sex, dose and generation) is defined in the Ec. 2 and 3.
... (2) ... (3) Therefore, it appears that if one of the terms in the product it is equal to zero (0), the entire calculation of the partial probability will be affected and consequently will distort the final calculation. These cases are presented when the probability of a variable that does not appear in the array of data available in the training stage is calculated. The solution to this problem was to apply Laplace smoothing 25 , in which counters each one of the joints are started in one (1).
Evaluating the accuracy of the classifier NB it is also partitioned according to the training and testing sets. For that, the probability for each of the original data in each set was calculated and the count of true positives and false negatives were performed using Ec. 4. The analyzed variable was considered true positive as long as their likelihood is greater than 50%. Finally, the accuracy was calculated for the classifier through the Ec. 5.
... (4) ... (5) In order to improve the accuracy of the NB classifier, it was applied genetic algorithms based techniques, taking several classifiers as individuals in a population. To optimize the accuracy, modified variables corresponded to the distribution of data for training and testing of each classifier, along with the location thereof in these two categories, altering the constant value of the random seed implemented for data distribution.
Among the most relevant parameters are included to handle an initial population randomized, real numbers are managed in the phenotypes of individuals, with genotype length of 14 bits, genotype -phenotype conversion through the gray code, roulette crossing technique, elitism technique and mutation technique, altering one (1) random gene for each genotype of the selected individual. The algorithm was programmed and run on a PC, which most relevant technical specifications included Intel® Core™ i5-2500 processor (4 cores at 3.3GHz), 8GB RAM and Windows 10 x64 operative system. At Figure 2 it is depicted the pseudocode of the proposed algorithm.

RESULTS AND DISCUSION
The proposed algorithm was implemented in an application programmed in C#, which accesses and processes the database information contained in .xlsx files, with the ability to automatically add the sets of each of the predictor variables for predicting the desirable effect. Within the input parameters to optimize the classifier, it is possible to adjust the number of individuals in the population, the proportion of evolution techniques (elitism, crossover and mutation) and the stop criteria (number of iterations and tolerance). In Table 1 the results of each of the possible combinations for prediction of effects on the reproductive system are shown.
Firstly it is observed that alterations by exposure to Fenthion vary directly proportional by increasing the dose and vice versa, regardless whether alterations are present or not on the reproductive system. At low dosage levels, the odds of not suffering effects for future generations increase significantly. Secondly, it is perceived that male rats show a greater resistance to female to suffer effects on the reproductive system,

Use the classifier for predicting effects
End regardless of the dose and studied generation. In addition, the probability of suffering effects on the reproductive system is considerably higher in the parents of the first generation, while the following generations are susceptible to effects on other systems, a factor which is also influenced by the dosage level. Finally it is reflected in the different dose levels, the likelihood of having abnormalities in the reproductive system is virtually unchanged, while probabilities of having effects in other systems shows a remarkable variation, increased over the future generations.
Moreover, NB classifier optimization was analyzed through the variation of the population and the proportion of evolution techniques. It were tested cases with low, normal and high quantity of individuals (10, 25 and 60 individuals respectively). In the first case, the evolution of the population showed no tendency to optimize the classifier error, irrespective of the proportions of the evolution techniques. The ideal situation is noticed in the second case, shown in Figure  3, in which it was executed with proportions of elitism, crossover and mutation of 15, 85 and 30% respectively, with a general error of 2.25%, random seed 1916 and a proportion of "training -test" of 51.00% and 49.00% respectively. Finally, in the third case only small variations are observed in the error of the classifier, watching a population that hardly evolved over the generations.
Performance and behavior of the algorithm for classifier optimization was performed appropriately, switching the number of individuals in the population, as depicted in Table 2. For  Fig. 3. Evolution of classifier accuracy over generations example, it is contemplated that the ratio used in training and testing data suitably ranges; if the number of individuals is low, the proportion of data is better balanced and vice versa. In addition, it is perceived that the accuracy of the classifier converges to the same value range using different seeds for the random distribution of the original data.
The NB classifier error was tempered considerably by the execution of genetic algorithms, unlike if the sorter is running under normal parameters, where the magnitude of this variable may worsen from manually choose a random seed and proportion of "training -testing" data.

CONCLUSIONS
The probabilities of suffering effects on the reproductive system, are considerably higher for the first generation rats, for doses between 0.05 to 0.1 mg/kg/day, despite the risk of inheriting these effects. With the passage of generations decrease significantly, while by increasing the dosage levels ranging from 0.70 to 5.00 mg/kg/day, the risk of disease in other systems gradually increase. Regardless of the dosage and the generation studied male rats, reflect to be more resistant to suffer effects on the reproductive system. It was allowed propose and carry out an algorithm with the ability to predict effects on the reproductive system, by the ingestion of Fenthion, through processing of qualitative and quantitative data, adopting the Naïve Bayes classifier and optimized with the implementation of genetic algorithms.
The algorithm presented in this work, supports other prediction effect analyzes for other animal species or human studies according to data availability, because this technique is robust enough to study other databases, which to predict effects on other localized systems for several pesticides.