The efficiency of the RULES-4 classification learning algorithm in predicting the density of agents

: Learning is the act of obtaining new or modifying existing knowledge, behaviours, skills or preferences. The ability to learn is found in humans, other organisms and some machines. Learning is always based on some sort of observations or data such as examples, direct experience or instruction. This paper presents a classification algorithm to learn the density of agents in an arena based on the measurements of six proximity sensors of a combined actuator sensor units (CASUs). Rules are presented that were induced by the learning algorithm that was trained with data-sets based on the CASU’s sensor data streams collected during a number of experiments with “Bristlebots (agents) in the arena (environment)”. It was found that a set of rules generated by the learning algorithm is able to predict the number of bristlebots in the arena based on the CASU’s sensor readings with satisfying accuracy.


Introduction
Inductive learning is the process of creating a hypothesis or a classifier from examples that can be generalized and applied to new examples (Mitchell, 1997). Generally, a classifier can make two types of classification errors in new examples for a two-class problem. It can misclassify positive instances as negative and negative instances as positive. The rate of correct predictions made by the classifier is the prediction accuracy of this classifier in the specific data-set (Kotsiantis, Pierrakeas, & Pintelas, 2004).

PUBLIC INTEREST STATEMENT
The paper describes the training of a classification algorithm on a live sensor data stream produced by six proximity sensors of a combined actuator sensor unit (CASU). The purpose of the investigation described in this paper was to assess the ability of the learning algorithm to determine the number of honeybees or bristlebots in an arena equipped with a CASU. We placed different numbers of bristlebots in the arena and analysed the rules generated by the algorithm with regard to their ability to discriminate between low and high bristlebot densities.
Learning tasks can be very different from each other in many aspects, including the kind of input or feedback they get (examples or examples with hints), the way they are tested (during the learning process or only at the end, number of allowed mistakes) and the level of control over the learning process (does it affect the world or just observe it?). Thus, many different models have been suggested for formalizing the learning process (Murthy, 1998;Quinlan, 1986).
In rule induction systems, a decision rule is defined as a sequence of Boolean clauses linked by logical AND operators that together imply membership in a particular class (Fürnkranz, 1999). The general goal is to construct the smallest rule set that is consistent with the training data-set. A large number of learned rules are usually a sign that the learning algorithm tries to "remember" the training data-set instead of discovering the assumptions that govern it (Fürnkranz, 1997). During classification, the left-hand sides of the rules are applied sequentially until one of them evaluates to true, and then the implied class label from the right-hand side of the rule is offered as the class prediction (Cleetus & Dhanya, 2014).
The type of learning described in this paper is known as supervised machine learning, which designates the search for algorithms that reason from externally supplied instances (examples) to produce general hypotheses, which then make predictions about future instances (Quinlan, 1993). The goal of supervised learning is to build a concise model of the distribution of class labels in terms of predictor features. The resulting classifier is then used to assign class labels to the testing instances where the values of the predictor features are known, but the value of the class label is unknown (Kotsiantis, 2007).
Every example in any data-set used by machine learning algorithms is represented using the same set of features. The features may be continuous or discrete. If examples are given with known labels (the corresponding correct outputs), then the learning is called supervised, in contrast to unsupervised learning, where examples are unlabelled. By applying these unsupervised (clustering) algorithms, researchers hope to discover unknown, but useful, classes of items (Jain, Murty, & Flynn, 1999).
On the other hand, zoologists and psychologists study learning in animals and humans. There are several parallels between learning in animals and machine learning where many machine learning techniques are derived from the efforts of psychologists to make more precise their theories of animal and human learning through computational models. Thus, the concepts and techniques being explored by researchers in machine learning may illuminate certain aspects of biological learning (Nilsson, 1998).
Concept learning is a central part of animal cognition that enables appropriate motor response in novel situations by generalization of former experience, possibly from a few examples (Sandin et al., 2014). Michalski claimed that to satisfy hunger, an animal must be able to classify some objects as edible despite the great variety of their forms and changes they undergo in the environment (Michalski, 1983). Thus, an intelligent system must be able to form concepts, that is, classes of entities united by some principle that might be a common use or goal (Michalski, 1986).
The work described in this paper is concerned with classification problems in which the output (the class) assumes only discrete, unordered values that represent the number of the bristlebots in the arena, while the attribute values are continuous.
This work is part of the Animal and Robot Societies Self-organise and Integrate by Social Interaction-bees and fish (ASSISI bf ) project, which generates a mixed society of honeybees and artificial (robotic) agents as a novel bio-hybrid system that can achieve self-awareness, selfregulation and environmental awareness through self-organization and collective information processing (Schmickl, Szopek, & Bodi, 2013). The main goal of ASSISI bf is to establish a robotic society that is able to develop communication channels with animal societies (honeybees and fish swarms) on its own (Schmickl, Bogdan, et al., 2013). The scientific goals of ASSISI bf are to develop robots that can influence the collective behaviours of animals (bees and fish), establish an adaptive and selforganizing society built by robots and animals, enable the robots to autonomously learn the social language of the animals, establish mixed societies that pursue a common goal which can be defined by human users of the system and allow the robots to gain novel skills by incorporating the capabilities of the animals (sensors and cognition).

Learning process
Choosing and using a specific learning algorithm is a critical step. Once preliminary testing has produced satisfying results, the classifier (mapping from unlabelled instances to classes) is available for routine use. The evaluation of the classifier is most often based on prediction accuracy which is the proportion of correct predictions with regard to the total number of predictions (Kotsiantis, 2007).
There are many ways to calculate a classifier's accuracy. One way is to split the data-set, using two-thirds for training and one-third for estimating the performance (unseen data-set). For another method, known as cross-validation, the training set is divided into mutually exclusive and equalsized subsets and for each subset, the classifier is trained on the union of all the other subsets. The average of the error rates of each subset is therefore an estimate of the error rate of the classifier. This validation is computationally expensive but useful when the most accurate estimate of a classifier's error rate is required. If the error rate is too high, a variety of factors has to be examined (Vanitha & Niraimathi, 2013). Features that are relevant for the problem might be ignored by the classifier, a larger training might be needed, the dimensionality of the problem might be too high, parameters extracted from the data-sets might need to be retuned and the selected algorithm might be inappropriate. High error rates can also occur as a consequence of imbalanced data-sets (Japkowicz & Stephen, 2002).

Brief description of the machine learning algorithm
Machine learning studies automatic techniques for making accurate predictions or choosing useful actions based on experience or observations (Alpaydin, 2010;Murphy, 2012).
Many successful applications and algorithms of machine learning exist already.
The choice of the machine learning algorithm to be used was important as the algorithm will be implemented in the combined actuator sensor units (CASUs) in the future. Since these devices will not provide much computational power, the selection of the most simple algorithm suitable for solving the given problem was a crucial decision. For this reason, we opted for the RULES-4 algorithm (Pham & Dimov, 1997), which is a widely used incremental inductive learning algorithm from the "RULES" family of automatic rule extraction systems. It allows the stored knowledge to be updated and refined rapidly when new examples are available. RULES-4 shares some common features with its immediate predecessor, RULES-3 Plus (Pham & Dimov, 1996) such as the rules forming procedure as shown in Figure 1. The RULES-4 algorithm is summarized in Figure 2. The algorithm extracts rules incrementally by processing one example at a time. Figure 3 explains the interaction between short-term memory (STM) and long-term memory (LTM) in the algorithm. The learning process iterates over the examples of the training data-set and loads a new example into the STM in every step. When the STM is full, a random example is discarded from the STM to make room for a new example. The final rule set produced by the algorithm is kept in the LTM for evaluation with the test data-set.
The STM and LTM of the algorithm works in analogy with the organization of biological memory. The STM works similarly to the human STM as it stores a very limited amount of information (e.g. a phone number). This information is processed by the brain and the results may be transferred to LTM where it may be stored for a long time or removed after not being used for a longer period.   Figure 4).
Thus, repeatedly occurring events are stored in permanent memory comparable to rules being stored in the rule set, which will then be used to test the strength of the algorithm based on the unseen data-set.
A similar scenario has been developed by Hunt (1982) and Sinclair (2010Sinclair ( , 2011 with a system of buffers that enables humans to organize incoming stimuli, a LTM of essentially infinite capacity, and a short-term or working memory that does the human conscious mental work. Pham and Dimov (1997) state that the input information needed to start the induction process in the RULES-4 algorithm includes: (a) A set of examples examined in some of the previous iterations and retained in the STM. The size of the memory has to be predefined by the user. Initially, the STM is empty.
(b) The set of rules extracted so far and stored in the LTM. At the beginning, the LTM can be empty or can contain initial rules defined by the user. Apart from the rules themselves, statistical information is stored regarding the numbers of examples correctly classified and misclassified by each rule in the LTM. Using this information, a decision is made as to which rules to prune from the LTM.  (c) Ranges of values for the numerical attributes. As with RULES-3 Plus, the RULES-4 algorithm can also deal with numerical attributes by quantizing them. The ranges of values for these attributes are updated before processing every new example. The number of quantization levels for each range is specified by the user. For the experiments presented in this paper, the quantization level has been set to six levels. When a new example is processed, RULES-4 forms a corresponding description in which the values of all numerical attributes are represented by appropriate quantization levels. Induction is then carried out with the new description, the quantization levels being treated as any other values.
(d) The number of examples found in each class so far.
It is very important for a rule induction algorithm to generate decision rules that have high predictability, which is normally measured by a function called rule quality. This function is needed in both the rule induction and classification processes such as H-measure, which has been used in the RULES-3 algorithm (Pham & Dimov, 1996). relates to its accuracy (Lee, 1994).
The important question to ask, when dealing with machine learning algorithms, is not whether a learning algorithm is superior to others, but under which conditions a particular method can significantly outperform others on a given application problem.
The objective is to utilize the strengths of one method to compensate for the weaknesses of another. If interested in the best possible classification accuracy, it would be difficult to find a single classifier that reaches a performance similar to that of a good group of classifiers. There is a tradeoff between classification accuracy and the computational effort to increase accuracy without decreasing comprehensibility (Guyon & Elisseeff, 2003), where the point to which the classification accuracy needs to be optimized depends on the type of application.
to heat or vibration can be employed to provide the animals with an interactive environment. Figure 5 shows a picture of the CASU used in this research. The idea behind the integration of the CASUs into animal societies is to adapt the behaviour of the CASUs to influence the behaviour of the animals, for example, to make young honeybees aggregate at a number of target points designated by heat or vibration or to make them choose between targets of different quality. In order to achieve the desired behaviours in the animal societies, evolutionary algorithms will be used as adaptation mechanisms to produce appropriate decentralized controllers for the CASUs. Figure 6 shows the mechanical design of a CASU. The CASU body consists of a lower metal part and an upper plastic part, which is the only part visible to bees. The hexagonal upper part of the CASU is equipped with six infrared (IR) proximity sensors that guarantee an all-round coverage for detecting honeybees up to a distance of 20 mm (diameter of the CASU).These sensors do not influence the bee's behaviour since they do not see IR light (Menzel, 1977). In the experiments described in this paper, we used the six proximity sensors to detect the bristlebots in the arena.

The bristlebot
The bristlebot is one of the simplest forms of mobile robots, both in function and in construction. It is a rigid-bodied robot, the bottom of which is equipped with bristles, like the head of a brush. The majority of the bristles are inclined against the vertical, which gives the bristlebot a preferred "forward" direction when its body is vibrated and the bristles convey the vibration to the ground, thus propelling the bristlebot. Most bristlebots are powered electrically, making use of modern developments in low-mass motors and batteries.

The arena
In this study, we used a round arena as shown in Figure 7 with a floor made of wax sheets (standard wax combs as used in bee keeping) and a plastic wall coated with Teflon in order to keep the bees from climbing the wall.

Honeybees
The honeybees (Apis mellifera ssp.) presented in this paper were extracted from their hives as sealed brood with their brood comb and were reared in an incubator until hatching. We used them for experiments during the first 24 h after hatching to exploit their inability to fly and their known temperature preference at this age. Their preference for specific temperatures is fundamental to their temperature-induced aggregation behaviour, one of several forms of collective behaviour (swarm intelligence) found in honeybees.

Data description and processing
The initial collection of the data is done by recording the proximity sensors' values, which represent the intensities of the reflections of the pulsed infrared signals emitted by the IR sensors at a rate of 10 s −1 to detect moving agents in the arena within a maximum range of 2.0 cm. Each data-set contains the actual number of bristlebots in the arena and 595 records collected during 59.5 s with six columns that represent the values of the six sensors of a single CASU. We opted to integrate the sensor readings of 1 min into a single prediction to achieve good classification accuracy at an acceptable rate. For evaluation, the sensor values were converted to the Boolean values 0 (no detection) and 1 (detection) based on a user-specified sensor-specific threshold which we set to 10% of the respective sensor's average value in the data-set. We repeated the experiments 109 times with bristlebot densities varying between one and nine bots in the arena. Table 1 shows a sample of the Boolean data derived from the sensor readings collected from the CASU's proximity sensors with nine bristlebots in the arena.
In order to conclude from the sensor activity to the number of bristlebots in the arena, we defined a set of suitable continuous attributes that are based on the rate of activity (number of active states per number of measurements) of individual sensors and combinations of sensors. For example, one set of attributes is based on the number of concurrently active sensors. A sensor was considered as active whenever its sensor readout-value exceeded a pre-defined threshold θ.
Another set of attributes is based on the number of concurrently active neighbours (2-6), semineighbours (2 or 3) and opposite sensors. Table 2 shows the sub-set of attributes used in the learning experiments along with the type of each attribute (continuous). Since three active semi-neighbours and active opposite sensors never occurred, they were not considered for this study and were thus excluded from the set of evaluated attributes.
Based on the sub-set of attributes, we created a new data-set with 109 examples, 9 attributes and 9 discrete classes (the number of bristlebots ranging from one to nine). For the learning process, the data-set was split into a training set (70 examples, 64%) and test set (39 examples, 36%) in these experiments.   Table 3 shows the description of the data-set used in these experiments. Table 4 shows a sample of the data-set used in the learning experiments.

Experiments CASU-bristlebots
The purpose of the learning experiments is to assess the classification algorithm's ability to classify the group size based on the live sensor data streams that are produced by the CASU proximity. The rules generated by the learning algorithm will enable the CASU to know how many honeybees or bristlebots are around it. Figure 8 shows an image of nine bristlebots in the arena with a diameter of 10 cm (Figure 8(A)), the detection ranges of the six sensors (Figure 8(B)) and the activity of one sensor during the course of a 1-min experiment (Figure 8(C)). Due to the availability of synchronously recorded videos, it is possible to synchronize the position of the bristlebots with the sensor readings they trigger.    Figure 9 shows three different scenarios of a CASU in the arena with different types of agents.
The first scenario (Figure 9(A)) shows honeybees in a larger arena (d = 60 cm) with a simulated CASU that emulates the detections of a real CASU with the help of a visual tracking system to simulate CASU-based detection of honeybees in the area for later use. The second scenario ( Figure  9(B)) shows a CASU with bristlebots in the arena. Figure 9(C) shows a CASU with honeybees in the arena. This paper concentrates on the second scenario of the CASU with bristlebot and describes how the classification algorithm would enable the CASUs to determine the number of bristlebots in the arena. The algorithm will be trained on the live sensor data stream(s) that are produced by the CASUs' sensors to detecting the number of bristlebots around the CASU, from which it will be able to conclude to the number of bristlebots in the arena.
The goal of the experiments was to change the number of bristlebots in the arena and test whether the algorithm can successfully discriminate between low and high density of agents in the arena. For the further development of the CASUs, it is crucial to demonstrate their ability to learn on the fly how to evaluate the number of agents in the arena based on the data stream generated by its six proximity sensors.  The algorithm is designed to accept the maximum number of conditions in one rule and the maximum number of examples in the STM as user-specified parameters. These two parameters influence the accuracy and the number of generated rules and the coverage of all classes in the rule set, which is very important for the evaluation. The learning experiments were repeated 16,384 times and from the generated rule sets, we used those with the best classification accuracy after discarding all rule sets that did not cover all classes. The large number of repetitions is due to the nature of the incremental RULES-4 algorithm where a new hypothesis is generated in every run. Thus, the probability to find a good rule set grows with the number of runs. Table 5 shows the results of running the algorithm on the collected data-set described in Table 3. The results of three conditions per rule are almost identical to the results of two conditions per rule; therefore, only experiments with one and two conditions were evaluated in this study. In addition, no accuracy improvement was observed for STM size of more than 15. Figure 10 shows an exemplary rule set generated with one condition per rule. Figures 11 and 12 show histograms of the frequencies of classification errors in the range [−9, +9] (−9 indicates an underestimation of 9 bristlebots, i.e. 9 bots were in the arena and none were detected and similarly +9 indicates an overestimation of 9 bots) made when applying the best rule set retrieved from the iterations of the algorithm with one and two conditions per rule.
The rule set with two conditions per rule is much better than that with one condition per rule when considering the number of correctly classified examples as shown in Figure 12. Therefore, the histogram of classification errors provides important information about the rule sets. For instance, the rule set with one condition per rule was able to correctly classify only 34.9% of the examples,  while the rule set with two conditions per rule could correctly classify 40.4% of the examples. However, when an absolute classification error of 1 can be tolerated, the rule set with one condition per rule performed better (72.5% acceptable classifications) than the rule set with two conditions (56% acceptable classifications). Therefore, the selection of the best rule set depends on the requirements of the task. The learning programme may be able to correctly classify only 50% of the instances, but it might be able to classify 90% with a maximum absolute error of 1, so that a still very good estimate for the number of robots would be available.
Increasing the STM size allows for more rules to be generated, and the larger rule sets achieve better results on the data-set the learning programme has been trained with.  Table 6 shows the average accuracies over all 16,384 experiments for different STM sizes for one condition (Table 6(A)) and two conditions (Table 6(B)). The two tables show that the average accuracy improves with growing STM size (Figure 13), which means higher STM sizes result in better rules being generated.
The correlation between STM size and classification accuracy is due to the fact that the rule learning process starts when the STM is full, so that a larger STM provides more examples to start learning from.
On the other hand, increasing the accuracy also leads to an increase in the number of examples classified correctly (prediction error = 0) as shown in Table 7. The table also shows the number of acceptable classifications, a measure which allows some fuzziness in the classification as it tolerates a certain classification error. In this case, we tolerated an error of one bot, so that the number of acceptable classifications is the number of correct classifications plus the number of underestimations by one (i.e. one bot too few) plus the number of overestimations by one (i.e. one bot too many). Figure 14 shows a graph of the proportion of correct and acceptable classifications for rules with one condition, for which the best result is achieved in terms of both correct and acceptable classifications when STM size is 12.  The results of two conditions rules are shown in Table 8 and Figure 15.
These results are promising as they prove that the rule sets generated by the learning algorithm are capable of predicting the number of agents in the arena with sufficient accuracy and that the learning algorithm itself can be implemented in the CASU to allow them to learn on the fly the number of the bristlebots in the arena.

Conclusions and future work
With the help of a simple machine learning algorithm, RULES-4, the density of bristlebots in the arena can be determined from the sensor readings of a few short-range proximity detectors. Although the algorithm is incremental, it can deal with both large and small data-sets, so that the quality of the generated rule sets does not depend on the size of the data-set. The algorithm can be used in the ASSISI bf project to allow a CASU to estimate the number of bees in the arena by implementing the results of the learning programme or even the learning algorithm itself in the software of the CASUs. It will also be interesting to use the optimized attributes and their thresholds in a simulation of virtual CASUs in recordings of the bee arena with known numbers of bees (i.e. an algorithm calculates the number of bees based on the attributes) and then evaluate the results to determine the quality of the predictions. In a first step, the best rule sets generated in this study can be implemented in the CASUs' control software in order to optimize their ability to predict the number of honeybees in the arena.
The average accuracy of the learning algorithm could be improved by altering the algorithm's user specified parameters.
The implementation of an on-the-fly learning system in a stationary robot that enables it to dynamically adapt its behaviour to the number of nearby agents is a novel approach that will provide a new level of interaction between stationary robots and mobile agents.
Future work will concentrate on experiments with honeybees. The learning algorithm will be used to determine whether a bee near the CASU is standing, moving slowly or fast, or rotating around the CASU (including information on the orientation). In order to accelerate our experiments with the CASUs, we will employ several arenas next to each other, each of which will be equipped with a CASU. This will enable us to run several experiments in parallel and thus increase the number of experimental runs per time.