Artiﬁcial Immune Systems, Dynamic Fitness Landscapes, and the Change Detection Problem

Bio-inspired computational algorithms are always hot research topics in artificial intelligence communities. Biology is a bewildering source of inspiration for the design of intelligent artifacts that are capable of efficient and autonomous operation in unknown and changing environments. It is difficult to resist the fascination of creating artifacts that display elements of lifelike intelligence, thus needing techniques for control, optimization, prediction, security, design


Introduction
To let biological processes, behaviors and structures inspire the design of problem solving algorithms and devices has been a prominent and persistent theme in engineering and applied sciences in the last few decades.Within this context, bio-inspired computing has taken a pioneering role.Fields such as evolutionary computing (1; 8; 25), artificial immune systems (4; 6; 43), membrane computing (29) or swarm systems (9; 22) have outgrown their infancy and found theoretical ground as well as important applications.The fact that and the way how these fields advanced into its current form is due to three major developments: (i) the upcoming of cheap, fast and reliable computational power in form of digital computers, (ii) the understanding that computational power in connection with implementing an algorithmic approach creates potent problem solvers, and (iii) the insight that biological systems can be fruitfully understood as information-processing units and can hence frequently be employed for computational and/or algorithmic proposes.This trend is of course not to be confused with computational biology, but it is highly related and probably unthinkable without the fundamental progress towards algorithmization and mathematization in biology, see e.g.(5; 16; 21; 38) for some recent discussion.Among the mentioned fields of bio-inspired computing, evolutionary algorithms and artificial immune systems play a unique role as their history is particularly long and the maturity reached is notably high.In this paper we will use both schemes in connection to solve the intertwined problem of maximum tracking and change detection in dynamic optimization.
For successfully solving dynamic optimization problems by evolutionary computation, there is a need for additions to the standard algorithmic structure, namely by operators maintaining and enhancing population diversity.Dynamic optimization here means that the topology of the associated fitness landscape changes with time.A considerable number of these operators for diversity management (for instance memory schemes, random immigrants or hyper-mutation (24; 26; 30; 33; 35; 39; 45)) can only be provoked and hence made to work properly if the points in time are known where the changes in the fitness landscape occur.So, the problem of change detection is of high practical relevance in solving dynamic optimization problems (3; 19; 27).
In principle, change detection is based on using information about the fitness values of points in the search space extracted from the fitness landscape.This extraction of information can be done in two ways.One is to use the fitness evaluations of the evolutionary algorithm's population, which is called population-based detection, the other is to use additional measurement of the landscape's fitness on prescribed points (26), which is called sensor-based detection.Recently, a study (34) compared both types of methods.It has been shown that in populations-based detection there is no need for additional fitness function evaluation, but elaborate statistical tests have to be carried out.On the other hand, sensor-based detection can forgo these tests but at the cost of redoing measurements in the fitness landscape.Irrespective of the quality of the change detection, using statistical tests on population-based fitness data is sometimes generally objectionable.Although using non-parametric statistical tests fits the non-Gaussian nature of the fitness distribution, the tests require independent samples to be accurately employed.This independence might not be given for fitness distributions from sequential generations.The fitness values of next generation's population have their origin in the current generation and are only partly affected by the stochastic influence driving the evolutionary algorithm.This situation might be different if a (randomly induced) change in the fitness landscape has occurred, but again there is no guarantee that the resulting fitness distributions are statistically independent.Because of these reasons, it would be desirable to have alternative methods.A promising option is the use of methods from artificial immune systems, particularly negative selection (4; 7; 13; 18; 44).These algorithms have been successfully employed to solve similar problems, for instance network security, computer virus detection, network intrusion detection and fault diagnosis, see e.g.(13; 18; 43).In this paper, we present an immunological approach to change detection in dynamic fitness landscapes.
The paper is organized like this.In the next two sections, dynamic fitness landscapes are introduced and the change detection problem is defined.Then, in section 4, the immunological change detection scheme is given and its main components, shape space, affinity function, detector generation and detection processing, are described.In section 5, we present numerical experiments with the scheme and use receiver-operating characteristics (ROC) as well as the area under the ROC curve (AUC) as an analyzing tool.We end with summarizing the findings and pointing at future work.

Dynamics and fitness landscapes
The concept of fitness landscapes is an important approach to foster theoretical understanding in evolutionary computation (20; 40; 41).Such landscapes are traditionally considered to be static and can be obtained from either a genotype-to-fitness mapping or more generally by encoding all possible solutions of the optimization problem and giving a fitness value to each solution.All the possible solutions span a search space S , while a fitness function f (s) : S → R provides every point s ∈ S with a fitness value.In case of a genotype-to-fitness mapping, S coincides with the genotypical space.If the search space S is not metric, we must explain which solutions we would obtain if we were to slightly modify a possible solution s ∈ S (and hence were to move it locally in the search space).This is done by a neighborhood structure n(s) which gives every point in the search space a set of direct and possibly also more distant neighbors.
If the fitness landscape is dynamic all of its three defining ingredients -search space S, fitness function f (s), neighborhood structure n(s) -can, in principle, be changing with time.So, we additionally need for description a time set and mappings that tell how S, f (s) and/or n(s) evolve with time (31; 32; 37).Dynamic optimization problems considered in the literature so far address all these possibilities of change to some extend.Whereas a real alteration of the fundamental components of a search space such as dimensionality or representation (binary, integer, discrete, real, etc.) is really rare, a change in the feasibility of individuals is another and less substantial kind of a dynamic search space and is discussed within the problem setting of dynamic constraints (28; 36).The works on dynamic routing can partially be interpreted as a changing neighborhood structures (2; 15), while most of the work so far has been devoted to time-dependent fitness function (24; 26; 30; 33; 35; 39; 45), which will also be the focus of this paper.
In dynamic optimization problems (DOPs), the fitness landscape Usually, γ is considered to be constant for all generations t, but it might also be a function of k and even be different (for instance a positive integer realization of a random process) for every k.Note that we require more than one generation in between landscape changes, γ > 1, and hence k = ⌊γ −1 t⌋.

The change detection problem
From (3), we see that the fitness landscape changes every γ generations.As the temporal patterns of these changes are assumed to be not explicitly known, our interest is now to infer from the fitness values of the individuals p i (t) ∈ P(t)| t=γk if a change in the fitness landscape has occurred or not.This we call the change detection problem in dynamic fitness landscapes using fitness data from the population.More explicitly, we want to detect the change point t cp with the property ∃x ∈ M for which Our convention is to define the change point t cp in the generational time scale t as we base the detection solely on the fitness values f (x, ⌊γ −1 t⌋) of the population P(t).From (3) follows ⌊γ −1 (t cp − 1)⌋ = k − 1 and ⌊γ −1 t cp ⌋ = k, that is for every integer γ −1 t there is a change in the fitness landscape (1).
The change point definition (5) says that a change in the fitness landscape has happened no matter how small and insignificant the alteration in the landscape's topology actually is.From a computational point of view this raises some problems regarding practical detectability.Change detection based on population data assumes that a change in the fitness landscape affects a substantial number of its points and makes them to increase or decrease their fitness values.Moreover, generally there is P(t) = P(t − 1) so that we cannot check if Another aspect is that the given framework is unsuitable for discriminating between small but gradual changes and larger but abrupt changes.Such a distinction can only be made in terms of the fitness landscape considered.The treatment and discussion presented here is intended to apply for fitness landscapes that undergo abrupt and substantial changes in their topology.Hence, it can reasonably be assumed that these changes are practically detectable.We exclude small but gradual changes in the fitness landscape (for instance those resulting from the presence of noise and/or other perturbations in the fitness evaluation process).This is in line with the application context of the change detection scheme considered here.It should help to trigger and control the diversity enhancement and maintenance of the EA.For fitness landscapes with small but gradual changes an additional change-activated diversity management does not play a prominent role anyway; other types of EAs (particularly those emphasizing robustness such as self-adaption) are found to be more apt here.
As shown above it is generally not possible to verify condition (5) directly.The basic idea behind using the fitness values f (p i , k)| k=⌊γ −1 t⌋ of the population's individuals p i (t) ∈ P(t) is that these quantities form a fitness distribution that can be analyzed by itself or compared to the preceding ones, that is creating a time window of width , (F(t − 1), F(t − 2),...,F(t − )).The fitness distributions can be regarded as a data stream and monitoring this data stream should make visible the normal optimum finding mode of the EA but also reflect that this normal mode is disrupted if a change in the fitness landscape has occurred (and hence results in a different pattern when evaluating F(t) and F(t − 1), respectively).Statistically speaking, the considered data set F(t) can be regarded as coming from an unknown distribution D(t).This transforms the problem of change detection into the problem of testing whether the data sets F(t) and F(t − 1) or any data set created from any time window including F(t) are coming from different distributions or not, which is known as statistical hypothesis testing.This connection is widely applied in solving change detection problems, e.g.(14; 23).The obvious question here is which test can tell us whether D(t) is different from D(t − 1) and if this difference necessarily and sufficiently implies that a change has occurred.In the language of statistical hypothesis testing, the test should ideally show only true changes, that is have no false positives and indicate all of them, that is have no false negatives.However, statistical hypothesis testing methods regularly require that the samples F(t) are independent from F(t − 1).This most likely is not the case if the samples come form a moving population of an evolutionary algorithm.The data set F(t) includes the fitness value of an evolving population and represents two types of interfering population dynamics.An evolving population moves (ideally and in the best case) monotonically towards the optima and in doing so changes its mean and standard deviation.Such a convergence behavior of the EA which is desired and the intended working mode is again a statistical phenomenon.Moreover, the fitness values of F(t) are a direct result of the values of F(t − 1) and therefore can hardly be regarded as independent of each other.A second aspect is the reaction of a change in the fitness landscape, which more likely can be seen as independent if the dynamics is a stochastic process.For all these reasons we look for an alternative to statistical hypothesis testing.So, we intend to use ideas from artificial immunology for solving the change detection problem in dynamic fitness landscapes.

The immunological change detector
Artificial immune systems (AIS) are soft computing algorithms that take their inspiration from and mimicking working principles and functions of their biological counterparts (43; 44).AIS date back to the 80s (11) and were initiated by an increasing theoretical understanding of the natural immune system in connection with a strong interest in utilizing biological processes for computational proposes.These algorithms are capable of adaption, learning and memory and have been applied to problem solving in areas as different as classification, pattern recognition and data mining/analysis (6; 13; 18; 43).Among the different types of AIS, negative selection algorithms acclaimed a prominent role in solving so-called anomaly detection (7; 10; 13; 18; 42).Here, anomaly detection means to distinguish the normal behavior of a dynamic process, usually characterized by some (external) model, from anomalies defined by deviations from that model.In the following, we review negative selection and show how it can be used for detecting change in dynamic fitness landscapes.
Negative selection is anchored at the concept of a shape space that represents the observable features of the dynamic process for which a change in behavior needs to be detected.Within that shape space, we define a set of self elements that stem from the normal behavior.From these self elements, in turn, a set of detectors is derived that must not match any sample of the self set, usually by using some training data.Subsequently, the detectors are taken to decide if an incoming new feature data from the dynamic process is normal (self) or not (non-self).Thus, negative selection mimics the self/non-self discrimination of the natural immune system, see Fig. 1 which shows self and non-self elements in a two-dimensional shape space.We now describe the main components of the negative selection algorithm: shape space, affinity measure, detector generation and detection process.
i.) Shape space.The shape space of a negative selection algorithm is a representation of the data coming from the dynamic process under study and can be either a string over a finite alphabet (for instance a binary string, which has been used in a large number of previous works, see e.g. ( 18)) or real-valued (13).As the base for the change detection, the fitness distribution F(t), is real-valued it seems straightforward to use a real-valued shape space S =[0, 1] m here, where m is its dimension.Dimensionality of the shape space is an important parameter influencing computational effort and performance of the detection scheme (42).
The dimensionality of the fitness distribution F(t) equals the number of individuals in the population µ.So, for the reason given above, it appears sensible to pre-process the data from the fitness distribution with the aims of both reducing dimensionality and extracting the most meaningful information about when the landscape has changed.Among the several conceivable ways to do the pre-processing, we here consider the following scheme.The distributions F(t) and F(t − 1) are independently sorted according to their fitness values.Then the m 2 highest and m 2 lowest ranked elements of both sorted distributions are taken, the elements coming from F(t − 1) are subtracted from the ones from F(t) and finally these calculated quantities (which reflect the difference between two consecutive fitness distributions) are normalized to the interval [0, 1].Hence, the result is a point in the m-dimensional shape space.The given procedure of pre-processing the data is motivated by the common sense arguments that a change in the fitness landscape particularly affects the magnitude of the best and the worst fitness values and also that their relative difference from one generation to the subsequent one is telling if either a standard evolutionary search or a reaction to a landscape change has taken place.Note that by this pre-processing a metric on the fitness distributions is defined.
ii.) Affinity measure.The affinity measure states the degree of matching and recognition between elements (that are points) in the shape space.In other words, the affinity measure describes to what degree elements in the shape space differ.Every element in the shape space is defined by its center point c ∈ [0, 1] m and a matching radius r.According to their function in the immunological detection and classification process, there are three types of elements in the shape space: (a) self elements se =(cs, rs) that are samples known to belong to the self space (usually from a training data set), (b) detectors dt =( cd, rd) that are derived from the self elements and must not match them, and (c) incoming data samples id =( ci, ri) that must be classified as belonging to either the self set or not.
Self elements, detectors and incoming data are also known by their immunological motivated terms self cells, antibodies and antigens, respectively.There is a large number of different affinity measures, see e.g. ( 18) for an overview.We here use Euclidean distance so that there is a match if c i − c j < r i + r j , with the indices i, j denoting different elements of the shape space.
iii.) Detector generation.In detector generation, self elements that come from training data with known self/non-self discrimination are used to calculate detector center points and radii.Similarly to the situation in computing the affinity measure, there is a multitude of different detector generation mechanisms, see e.g.(18).In some initial experiments (which are not reported here for sake of brevity), a scheme mainly using ideas from v-detectors ( 17) had shown best results and is considered here.This scheme has the advantage to address the problem of coverage of the non-self space in the generation process and to maximize the size of individual detectors to achieve a larger coverage.This comes at the cost of making the number of detectors actually created a (not predicable) result of the generation process and not a parameter to be set initially.
The scheme works like this.Input is a collection of self elements se j , with j = 1, 2, . . .# se , and # se the total number of self elements.Further, we set a target coverage α and calculate a quantity h = 1 1−α .Then, the following steps are repeated.A candidate detector point and its radius are generated as a realization of a uniformly distributed random variable.It is tested if the detector matches any of the self elements.If so, its radius is shrunk so that any match is abolished.If not so, the radius is enhanced to the limit of any match.After that it is tested if the candidate detector with its updated radius is entirely covered by a detector that already passed this test.If so, the candidate is discarded, otherwise it is saved to the set of detectors that passed the coverage test.These steps are repeated until candidate detectors cannot pass the coverage test h times in a row.The saved detector candidates are accepted for the change detection process.iv.)Detection process.After the training time in which detectors are generated as described above, the immunological change detector can be used for deciding if a change point according to eq. ( 5) has been reached or not by monitoring a metric defined on the fitness distributions F(t) and F(t − 1).The necessary pre-processing is the same as the one given for the training phase above in Sec.4.i.So, an incoming data sample id(t)=(ci(t), ri) is produced every generation t, where the center point comes from the pre-processing and the self-radius ri is a quantity that defines the sensitivity of the detector, is to be set in initializing the immunological change detector and will be examined in the numerical experiments reported below.The affinity function is calculated by using a Euclidean distance measure aff(id(t), dt j ), ( 7 ) where # det is the total number of detectors dt j , the individual affinities and β is a weighting factor.From the values w(t), a change point can be concluded.Therefore, a threshold value w has to be set and a w(t) > w indicates a change.
v.) Performance evaluation.To evaluate the success and the quality of the change detection, the method of receiver-operating characteristics (ROC) curves can be used, e.g.(12).ROC curves are a tool for organizing and visualizing classifications together with their performances.So, they can be used to analyze and depict the relative trade-offs between benefits of the schemes (correctly identified instances according to the classification) and costs (incorrect identifications).That makes them particularly useful to assess change detection schemes.The classification here is between positive and negative change detections.Hence, we can define the following performance metrics.If there is a positive detection and a change in the fitness landscape has happened, it is counted as true positive (tp), if a change happened but is not detected, it is a negative positive (np).If, on the other hand, no change has happened and the detection is negative, it is a true negative (tn), a positive detection in this situation yields a false negative ( fn).For this two-by-two change classification, we obtain as the elements of performance metrics: the tp rate tp ≈ correctly identified changes total changes (8) and fn = 1 − tp as well as the fprate fp≈ incorrectly identified changes total non changes (9) and tn = 1 − fp.In the ROC plot, the tp rate is given (on the ordinate) versus the fprate (on the abscissa).Hence, the tp and fp rates for the immunological change detector for a given threshold value w give a point in the ROC space; ROC curves are obtained by plotting the rates for varying the threshold value w.

Experimental results
In the following we report numerical experiments with the change detection schemes described above.In the experiments, we use as dynamic fitness landscape a "field of cones on a zero plane", where N cones with coordinates c i (k), i = 1, ••• , N, are moving with discrete time k ∈ N 0 .These cones are distributed across the landscape and have randomly chosen initial coordinates c i (0), heights h i , and slopes s i , see Fig. 2. So, we get where the number of cones is N = 50 and its dimension n = 2.The dynamics of the moving sequence for the cones' coordinates is mostly normally random, that is each c i (k) for each k is an independent realization of a normally distributed random variable.In the last set of experiments we also consider different kinds of dynamics, namely regular, circle-like (and hence completely predictable) dynamics where the cones' coordinates form a circle and so return to the same place in search space after a certain amount of time, and chaotic dynamics where the cones' coordinates follow the trajectory of a nonlinear dynamical system with chaotic behavior, see (30) for details of this kind of setting of landscape dynamics.
Further, we employ an EA with a fixed number of λ = 48 individuals that uses tournament selection of tournament size 2, a fitness-related intermediate sexual recombination (which is operated λ times and works by choosing two individuals randomly to produce offspring that is the fitness-weighted arithmetic mean of both parents) and a standard mutation with the mutation rate 0.1.Note that the choice of the EA is of secondary importance as long as it solves the DOP with some success.The immunological change detector was implemented as described in Sec.4; we set β = 5.Fig. 3 shows the detection process by monitoring the affinity function w(t) calculated according to eq. ( 7), for shape space dimension m = 8, coverage α = 0.999 and self radius ri = 0.2.We here use a training time of 20 generations.We see that even for this small training set, spikes in w(t) can be used as indication for changes in the fitness landscape.As results of a second set of experiments, we give the ROC curves for varying dimension m and coverage α, see Fig. 4. Here, training time is 400 generations with total 1000 generations taken into account, self radius is ri = 0.2 and change frequency γ = 20.The tp and fp rates are calculated according to the eqs.( 8) and (9) and are means over 100 repetitions.From the ROC curves we can deduce that the lower left point (0, 0) represents a change detection that never produces any positive decision.It makes neither a false positive error nor yields any true positives.Likewise but opposite, a detection represented by the upper right point (1, 1) only produces positive decisions with only true positives but also false positive errors in all cases.The line between these two points in the ROC space can be regarded as expressing a purely random guessing strategy to decide on whether or not a change has happened.Any classification that is represented by a point below that line is worse than random guessing, while classifications above are better, the more so if the point is more north-westwards of another, with the point (0, 1) expressing perfect classification.
With this in mind, we see from Fig. 4 that good detection results are achieved as we obtain curves that climb from the point (0, 0) vertically for a considerable amount of threshold values towards (0, 1) before bending off to (1,1).Further, it can be seen in Fig. 4a that a higher coverage rate produces slightly better results, but the differences are not dramatic.Also, varying the shape space dimension leads to no substantial increasing in the detection success.The curve for m = 8 is even slightly lower than that for m = 6.
An important feature of the v-detector design used in this paper is that we get detectors with variable size but also that their exact number is not known beforehand.The number is a statistically varying result of the creating process and thus becomes a quantity that can be verified and studied experimentally.Fig. 5a shows the number of detectors # det depending on the shape space dimension m, while Fig. 5b gives # det over the average detector radii r d .Both figures show a scatter plot for 20 subsequent detector generations each.It can be seen that generally the larger the coverage α and the dimension m the more detectors are produced.Both facts appear quite logical as higher coverage requires more detectors, and higher dimension means larger generalized volume to be filled by the detectors.For dimensions becoming larger also the actual numbers of detectors are more spread, that is they increase in their range of variation.For m = 8, for instance, and α = 0.900 the difference between the lowest value (# det = 15) and the highest (# det = 33) is 18, while for m = 4 the maximal difference is 8. Results that allow a similar interpretation can also be found for the number of detectors over the average detector radii r d , see Fig. 5b.Smaller coverage α produces not only a smaller number of detectors but also detectors with smaller average radii, albeit the range of radii overlap.A possible explanation is that for a higher coverage a larger number of detectors candidates are produced and tested, and hence it becomes more likely that such with larger radii are finally found and selected.In a next experiment we study the effect of varying the self radius ri together with the influence of different kind of landscape dynamics.To get a numerical evaluation of the ROC curves, we calculate the area under the ROC curve (AUC), which is a measure for the detection success (12).Since the AUC is a fraction of the unit square, its values are 0 ≤ AUC ≤ 1.Moreover, since random guessing gives the diagonal line in the unit square, a well-working change detection should have values AUC > 0.5.Fig. 6 shows the AUC over the self radius ri for different change frequencies γ, shape space dimension m = 8 and coverage α = 0.999, with Fig. 6a giving the results for random landscape dynamics, and Fig. 6b and Fig. 6c for regular dynamics (circle) and chaotic dynamics.The AUC is again the mean over 100 runs.We can generally see a reverse bath tube curve that indicates that a certain interval of ri is best.In some cases there is no significant differences in the performance within this interval, for instance for random dynamics and γ = 20, while for others, for instance random dynamics and γ = 5, a clear maximum can be observed.Further, the performance is generally better for random dynamics than for regular or chaotic, and a very fast landscape dynamics, γ = 5, may produce rather inferior detection results.These results appear to be a little bit surprising as it is know that for the optimization results for regular and chaotic dynamics we do not find such significant differences (33; 35).A possible explanation is that for random dynamics in the landscape the composition of the population is more homogeneous and the population dynamics more random-like.This in turn leads to better detector design and hence more effective change detection.

Conclusions
We have presented an immunological approach for solving the change detection problem in dynamic fitness landscape.A negative selection algorithm has been used to decide on whether or not the fitness landscape has changed.This is solely done with fitness information from the population on a sample base.Numerical experiments evaluated by receiver-operating characteristics (ROC) curves have shown the efficiency of the scheme.An important feature of the approach is that it does not directly uses any statistical test on which requirements could be imposed regarding the independence of the samples.In future work it would be interesting to compare and combine the immunological change detector with statistical tests.This could be connected with a study of dynamic fitness landscapes with higher dimension and complexity.

Fig. 1 .
Fig.1.Self and non-self elements and detectors for a two-dimensional shape space.

Fig. 5 .
Fig. 5. Number of detectors # det for different coverage α depending on: a) shape space dimension m; b) average detector radius mean(r d ).

Fig. 6 .
Fig. 6.AUC over self radius ri for different γ and different kinds of landscape dynamics, shape space dimension m = 8 and coverage α = 0.999: a) random landscape dynamics; b) cyclic landscape dynamics; c) chaotic landscape dynamics.
Both the time scales t and k work as a measuring and ordering tool for changes (t for changes in the population from one generation to the next, k for changes in the dynamic fitness landscape).As µ individuals p i (t) ∈ P(t), i = 1, 2, . . ., µ, populate the fitness landscape (1), they can be labeled with a fitness value f (p i (t), k).Both time scales are related in the solving process of the DOP by the change frequency γ ∈ N with t = γk.