A Search History-Driven Offspring Generation Method for the Real-Coded Genetic Algorithm

In evolutionary algorithms, genetic operators iteratively generate new offspring which constitute a potentially valuable set of search history. To boost the performance of offspring generation in the real-coded genetic algorithm (RCGA), in this paper, we propose to exploit the search history cached so far in an online style during the iteration. Specifically, survivor individuals over the past few generations are collected and stored in the archive to form the search history. We introduce a simple yet effective crossover model driven by the search history (abbreviated as SHX). In particular, the search history is clustered, and each cluster is assigned a score for SHX. In essence, the proposed SHX is a data-driven method which exploits the search history to perform offspring selection after the offspring generation. Since no additional fitness evaluations are needed, SHX is favorable for the tasks with limited budget or expensive fitness evaluations. We experimentally verify the effectiveness of SHX over 15 benchmark functions. Quantitative results show that our SHX can significantly enhance the performance of RCGA, in terms of both accuracy and convergence speed. Also, the induced additional runtime is negligible compared to the total processing time.


Introduction
Evolutionary algorithms (EAs) have been shown to be generic and effective to search for global optima in the complex search space theoretically [1][2][3] and practically [4][5][6]. e exploration process of EAs imitates the natural selection process, which is realized by conducting the offspring generation and survivor individual selection alternately and iteratively. e population quality is gradually improved throughout the exploration process, which can be viewed as a stochastic population-based generation-and-test process. Because of the offspring generation, a large number of candidate solutions (i.e., individuals) are sampled, accompanied by corresponding fitness values, genetic information, and genealogy information. Such accumulated search data constitute search history which can be very informative and valuable for boosting the overall performance. For instance, exploiting search history can be useful for improving the search procedure under a limited budget of fitness evaluations (FEs). at is, no additional FEs are allowed for improving the search performance. Also, the computational cost of a single FE can be high when the fitness functions are complicated. To enable a better solution for the population without increasing the number of FEs, the way of exploiting the search history truly matters. Nevertheless, search history has been sparsely exploited and studied in existing methods.
Real-coded genetic algorithm (RCGA) has been widely studied in the past decades [7][8][9][10][11], and the main efforts for improving the performance of RCGA have been focused on the development of the crossover techniques [12]. Because the crossover operator is to generate new offspring from the current population, the quality of the new solutions straightforwardly affects the evolution direction and convergence speed. Given different mechanisms, crossover methods can differ from (1) parent selection, (2) offspring generation, and (3) offspring selection. Both parent and offspring can be more than two, depending on the design. e abovementioned three aspects associate the exploration ability with exploitation ability, and the degree and balance between both abilities affect the performance largely [13]. Although the self-adaptive feature of RCGA [14] can adjust the relationship to a certain extent, the "best" degrees and balance between exploration and exploitation for achieving a satisfactory solution can differ greatly with respect to different problem settings and can be hardly achieved with the adaptive feature.
With a large amount of search history data up to the current generation in hand, we attempt to introduce a crossover method that effectively exploits the history data in this paper. At first, an archive is defined to collect the survivor individuals over generations as the search history. en, the stored individuals are clustered by k-means [15], and each cluster is assigned a score depending on the number of belonging individuals. At last, offspring is generated and selected according to the scores. We introduce two different schemes to update the archive. e proposed crossover operator, named search history-driven crossover (SHX), generates offspring by considering the cluster scores. Since SHX enables an offspring selection mechanism, any existing parent selection and offspring generation mechanisms can be easily integrated with it. To our knowledge, this is the first work to design the crossover model by effectively exploiting search history. We present a set of experiments to systemically evaluate the effectiveness of the proposed method using 15 benchmark functions. ree conventional crossover operators are employed, and the results with/ without SHX are compared. Apart from the above, two archive update methods are also analyzed. e main technical contributions of this paper are threefold. First, we propose a novel crossover model by effectively exploiting the search history. Second, we introduce the offspring selection based on the clusters calculated from the search history. ird, we introduce two schemes to update the survivor archive. A preliminary version of this paper appears in GECCO2020 [16].

Related Work
Crossover is one of the principal operators for generating offspring and deeply relates to the performance of the realcoded genetic algorithm (RCGA). Blend-α crossover (BLXα) [17] proposed by Eshelman and Schaffer is one of the most popular operators. Offspring genes are independently and uniformly sampled within an interval between a gene pair of parents. e parameter α corresponds to the extension of the sampling interval, which plays a key role in maintaining the diversity of offspring. Eshelman et al.
proposed Blend-α-β crossover (BLX-α-β) [18] which involves two extension parameters. Deb and Agrawal introduced simulated binary crossover (SBX) [19] which simulates the single-point crossover in binary-coded GA for continuous search space. e interval used in SBX is determined by a polynomial probability distribution β depending on the distribution index η. η indirectly adjusts the tendency of offspring generation. e above crossover operators have a common feature that the offspring genes are extracted according to a certain probability distribution from the predefined interval on the parent genes. is feature enables better results than using crossover operators for binary coding in the continuous search space. On the other hand, some crossover operators set more than two individuals as parents, which aim to generate offspring with well-preserved population statistics. In the case of unimodal normal distribution crossover (UNDX) [20], the generation of offspring follows a unimodal normal distribution defined on the line connecting two of the three parents. For simplex crossover (SPX) [21], D + 1 individuals are taken as parents in the D-dimensional search space. SPX uniformly generates offspring within D-dimensional simplex constructed by parent individuals and expanded by a parameter ε.
Search history has also been exploited in some research, but to the best of our knowledge, none of them is for the purpose of improving the crossover model. Since online real systems often provide uncertain evaluation values which lead to unreliable convergence of GA, Sano and Kita proposed memory-based fitness estimation GA (MFEGA) [22]. MFEGA estimates the fitness from neighboring individuals stored in the search history. Leveraging search history allows estimation without requiring additional evaluation. Amor and Rettinger proposed GA using self-organizing maps (GASOM) [23]. SOM (self-organizing maps) can provide a visualized search history, which makes the regions explored intuitive for users. Moreover, individual novelty is introduced by the activation frequency in the search history table and utilized by the reseeding operator to preserve the exploration power. Yuen and Chow presented the continuous nonrevisiting GA (cNrGA) [24]. A binary partitioning tree called a density tree stores all evaluated individuals and divides the search space into nonoverlapped partitions by means of distributions. ese subregions are used to check whether a new individual needs to be evaluated or not.

Overview
Principles of designing good crossover operators for RCGA are discussed in [25]. Two among them are especially important: (1) the crossover operator should preserve the statistics of the population; (2) the crossover operator should generate offspring with as much diversity as possible under the constraint of (1). By following these suggestions, the key idea of SHX is to cluster the search history and select population members from excessively generated candidate solutions by preserving the statistics represented by the clusters. Figure 1 illustrates the overview of our SHX. e proposed method is performed under the framework of RCGA which mainly involves survivor selection and crossover. Mutation is optional, but we exclude it to clearly investigate the effectiveness of SHX in this work. e proposed method is described in Algorithm 1. Population is denoted by P which comprises n P individuals, and the population at the t-th generation is denoted as P t . Similarly, parents for SHX, excessively generated candidate solutions during SHX, offspring after SHX, and survivors for the next generation are represented by P par , P can , P off , and P sur , respectively. e size of each set is denoted using n with a subscript of the set name (e.g., the size of parents is denoted 2 Computational Intelligence and Neuroscience by n P par ). In addition to P, our method manages an archive A which preserves n A survivors throughout the generation alternation. A and P are initialized by randomly placing individuals in the search space. e archive update process is conducted after the survivor selection. Survivor individuals P sur of the current generation are aggregated into both P and A of the next generation. SHX can be further divided into parent selection, offspring generation, and offspring selection. Different from conventional RCGA, individuals generated from P par are regarded as offspring candidates P can . e main purpose of SHX is to narrow down P can to n P off individuals denoted by P off according to the statistics provided by S. S is calculated from the clustering result of the archive and immediately impacts the offspring selection.
SHX can adopt any existing crossover operators (e.g., BLX-α [17] and SPX [21]) for the offspringGeneration function (Algorithm 1, line 8) to generate P can from P par . For the parentSelection function (Algorithm 1, line 6) and the survivorSelection function (Algorithm 1, line 11), the just generation gap (JGG) [26,27] is employed in this work. at is, the parentSelection function randomly extracts n P par individuals from P as P par , and the survivorSelection function selects top-n P sur individuals in P off as P sur according to the fitness value. To show the performance increase brought by SHX, we choose the widely applied BLX-α, SPX, and UNDX for the offspring generation and compare the results in Section 6. We explain archiveUpdate (Algorithm 1, lines 3 and 13) and offspringSelection (Algorithm 1, line 9) in detail in Section 4 and Section 5, respectively.

Survivor Archive
Since the genetic operations are run alternately and iteratively, collecting and analyzing the history data may be beneficial for boosting performance. Given that SHX is to maintain the historical statistics S while producing offspring for the next generation, the archive A is designed to store P sur over few past generations and extracts statistics S. e calculation of S is based on the k-means, which is an off-theshelf nonsupervised clustering method. e pseudocode of k-means is shown in Algorithm 2. In particular, k-means is employed to cluster the individuals in A based on their position in the search space, and S is a normalized frequency histogram to show the proportion of each cluster size to n A . A higher score indicates that the corresponding cluster is more likely to be a promising search region. e statistics can then be maintained by probabilistically assigning newly generated candidates to each cluster according to S.
To keep the computational cost brought by k-means within an acceptable and constant range, the archive size is fixed to n A . at is, a part of individuals in A must be replaced with new survivors P sur during the archive update to incorporate new information. Two types of update methods are considered in this work: (1) randomly selecting individuals in A and replacing them with P sur (denoted by random); (2) replacing a part of A with P sur in the order in which the individuals of A arrived (denoted by sequential). e performance comparison between these two approaches is discussed in Section 6. e update of A and calculation of S are executed in the function archiveUpdate (Algorithm 1, lines 3 and 13) which is summarized in Algorithm 3. At the replacement step (Algorithm 3, line 4), n P sur individuals are discarded from A based on random or sequential approaches, and new P sur are stored to A. Initialization is executed when t equals 0. e k-meansFit function (Algorithm 3, line 7) updates the centroids of the clusters according to the updated A and assigns updated cluster labels to each individual in A. After that, the normalized frequency histogram S for each cluster is calculated by the hist function (Algorithm 3, line 9) for further usage in offspring selection (Algorithm 4). Note that the initial centroids of the clusters in the current generation are inherited from the previous generation, as most individuals in A t are the same as A t−1 .  Figure 1: Overview of the proposed method. e proposed method is performed with an archive A under the framework of RCGA. A preserves survivors P sur over the past few generations and extracts statistics from them by clustering. Offspring P off are selected from excessively generated candidate solutions P can based on the statistics.

Search History-Driven Crossover (SHX)
SHX randomly selects parents by following the strategy of existing crossover operators (e.g., two parents in the case of BLX and D + 1 parents in the case of SPX) and excessively generates candidate offspring P can for further offspring selection. n P can ≫ n P off because P can must ensure a sufficient number of individuals that can be assigned to each cluster in A. Here, generating individuals excessively can also be considered as a mechanism of diversity preservation. It is worth pointing out that the offspring selection is a different procedure from the survivor selection. Offspring selection belongs to the crossover model and is conducted before fitness evaluation. Survivor selection is conducted after fitness evaluation. Offspring selection narrows down P can to P off based on roulette wheel selection [28]. Each proportion of the wheel relates to each possible selection (i.e., clusters), and S is used to associate a probability of selection with each cluster in A. is can also be viewed as a procedure that SHX preferentially selects individuals in more "promising" regions. is bias selection can encourage the evolution of the population and accelerate the whole convergence. Besides, ALGORITHM 1: Search history-driven crossover for RCGA.
Input: number of clusters k, Data points p 1 , . . . , p n Output: cluster centroids c 1 , . . . , c k (1) k cluster centroids are randomly initialized (2) While termination criterion is not satisfied do (3) For i � 1, . . . , n do (4) assign the nearest cluster centroid ID to p i (5) end (6) For i � 1, . . . , k do (7) update c i by calculating the mean of data points in the i-th cluster (8) end (9) end (10) Return c 1 , . . . , c k ALGORITHM 2: k-means. 4 Computational Intelligence and Neuroscience the statistics of the population (e.g., cluster size) can be maintained between two consecutive generations because the new generation is sampled based on the statistics of the history. Also, the diversity of P off can be preserved because each newly generated individual from P can has a probability to be assigned to A. e algorithm of offspring selection is shown in Algorithm 4. Input P can is excessively generated by existing crossover operators (Algorithm 1, line 8). Each candidate is labeled by the k-means Predict function (Algorithm 4, line 1) based on the current clusters estimated from A. en, the roulette is constructed based on S. e roulette selection is called n P off times, yielding n P off selected offspring. Each time of roulette selection produces a cluster ID, and one candidate in P can that belongs to the corresponding cluster is randomly selected and assigned to P off . To avoid duplicate selection, a selected candidate will be excluded from P can . If no more candidates correspond to a certain cluster (this is rarely the case by assuming n P can ≫ n P off ), the roulette is reconstructed by eliminating the proportion of the corresponding cluster. Finally, P off is passed to the survivor selection process which determines P sur using JGG.

Experimental Results
e performance of SHX is investigated over 15 benchmark functions, with each function in two different dimension settings. We comprehensively compare the performance of RCGA with/without SHX, and SHX is run with different settings of archive update methods (random/sequential) and offspring generation methods (BLX [17]/SPX [21]/UNDX [20]).

Experimental Setup.
Benchmark functions are a useful tool to verify the effectiveness of a method, and it is general to use several functions with different properties, such as in [29,30]. We selected 15 benchmark functions with different characteristics from the literature [31][32][33] for evaluation. Detailed information of each function is summarized in Table 1. Initialization of the population and the archive is conducted within the range provided by the 4th column in Table 1. It is worth mentioning that the searching space (i.e., range of parameters) during the generation alternation is not constrained. Each function is labeled according to different combination of characteristics (U + S, U + NS, M + S, and M + NS). By involving various characteristics of functions, we can analyze the proposed method more comprehensively and objectively. Furthermore, as all selected functions are adjustable in the setting of dimension, we adopt two different numbers of dimensions (D � 5 and D � 10) to control the difficulty degree of the search problem. e setting of hyperparameters of the proposed method is listed in Table 2. e proposed method includes hyperparameters of not only RCGA (number of generations, n P , and n P off ) but also SHX (n P can , n A , and k). Basically, the search problem defined by each function becomes more hard as the number of dimensions increases, which requires a lot of evaluations. For adaptive adjustment, the number of generations, n P , and n P off are set proportional to the number of dimensions.
e constant values of each parameter are empirically determined because the purpose of the experiments is to validate the effectiveness of having SHX, rather than achieving the best solution for each function.
All experiments are executed 100 times with different random seeds. In each experiment, the generation alternation completely executed the number of generation times defined in Table 2. For a fair comparison, iterations under the same random seed start using the same population. e runtime and fitness are recorded with Python implementation (without either parallelization or optimization) on a i7-7700 CPU at 3.60 GHz, 12.0 GB RAM desktop computer.
Survivors P t sur . Size of the archive n A . Output: updated archive A t , //archive update. (4) randomly or sequentially (first in first out) select n P sur individuals.
Computational Intelligence and Neuroscience 5

Comparison in the Final-Generation-Elite.
e results of the absolute error between the optimal value and the final-generation-elite fitness with respect to all combinations of functions, dimension, and methods are displayed in Table 3. Table 3 shows the minimum, maximum, median, mean, standard deviation (SD), and p value of the Mann-Whitney U test by each combination. e Mann-Whitney U test evaluates the significance of SHX results against results without SHX under the significance level p � 0.05. Before showing the superiority by involving SHX, we first exclude a few results that all the methods are trapped by local optima or cannot reach the global optima. (1) Easom Function f 8 . is function has several local minima. It is unimodal, and the global minimum only has a small area corresponding to the search space, which can be hardly arrived at. (2) Schwefel 2.26 f 10 . Since the setup of this experiment does not restrict the range of parameters during search, an extremely small fitness value (even smaller than the global optimum) can be achieved with this function, which is not suitable for comparisons.
From Table 3, we can observe the clear improvement of performance brought by SHX. e results of the p value show that the methods with SHX have recognized the significance at least in 23 settings among all 30 settings. In the other five results (minimum, maximum, median, mean, and SD), the methods without SHX cannot achieve outperformed results for most settings. For instance, focusing on the minimum results, the methods without SHX outperform the methods with SHX only 5, 0, and 4 times by BLX, SPX, and UNDX, respectively. On the other hand, SHX with sequential archive update achieves the best performance. SH-BLX_sequential, SH-SPX_sequential, and SH-UNDX_sequential show the significance in 27, 26, and 27 settings, respectively. In addition, they achieve the best results in most settings with respect to the maximum, median, and mean results. One possible reason for sequential outperforming random in most cases is that sequential removes the oldest individual which arrived first, and therefore SHX can select offspring according to the up-to-date search history to reflect the trend of evolution more sensitively. In contrast, random uniformly removes individuals in the archive, which may impede the discovery of new solutions since old individuals may be retained for more generations in the archive.

Analysis on BLX vs. SH-BLX.
It has been already known that the standard BLX [17] faces difficulties especially when the target function is nonseparable [34] due to the parameter-wise sampling. By observing the results of f 4 to f 7 and f 12 to f 15 from Table 3, we can find that involving SHX significantly improves the performance, which indicates that SHX can help BLX to greatly mitigate this drawback. It is easy to understand because offspring selection with clusters embeds distance measure which builds the relationship among parameters.
Function: P t off � offspringSelection (P t can , S t− 1 ). Input: candidates (P t can ), Score S t− 1 Output: offspring (P t off ) //labeling based on clustering results estimated in Algorithm 3, line 7 (1) clusters � k-meansPredict (P t can ); //roulette construction       Computational Intelligence and Neuroscience 9      Table 3 that SPX noticeably outperforms BLX. From Table 3, it is also very clear that SHX further boosts the performance of SPX to a large extent. In particular, the results of minimum and median are improved by involving SHX for all settings. As pointed out in [21], SPX has the ability to maintain the mean and covariance of the parent individuals, which is consistent with the design guideline of good crossover operators mentioned in Section 3. Since SHX manages an archive that stores search history over few generations, it can preserve some useful statistics (e.g., centroids of clusters) much longer. at is why SHX is able to enhance SPX.

Analysis on UNDX vs. SH-UNDX.
Similar to BLX and SPX, Table 3 shows that the results involving SHX are improved in most settings. UNDX is also designed to generate offspring inheriting the distribution of the parent individuals [35]. erefore, statistics of the search history provided by SHX are useful for UNDX to enhance search ability.

Comparison in Convergence Curve.
With the aid of search history, SHX not only achieves better results but also improves the convergence speed. In this section, we compare the generation alternation for over all the test functions in the case of D � 10. Evaluation values of elite individuals from the 1st generation to the 100th generation are plotted in Figure 2. e mean value of 100 trials is represented by the line, and the range between the minimum and the maximum is represented by the shaded area. Smaller area means more stable search. It should be noted that as the ranges of parameters are not constrained during the search procedure, methods can achieve infinitely small values of fitness, and a lower value does not mean a better result in the case of f 10 , as explained in Section 6.2. For BLX, SPX, and UNDX, exploiting SHX shows faster convergence speed comparing against them without SHX in most cases. e superiority becomes more obvious when the problem setting is more difficult (e.g., multimodal functions f 8−15 vs. unimodal functions f 1−7 ).

Comparison in Processing Time.
In this section, we show the runtime overhead of the processing brought by SHX. Figure 3 shows the comparisons in processing time of an optimization task (D � 10 and a single fitness evaluation takes 0.01 second) for BLX and SPX. e parameter setting follows Table 2, and all the results are averaged over 10 trials. It took 93.9 seconds and 94.1 seconds for BLX and SPX to complete the entire process, respectively. SH-BLX_random took additional 1.7 seconds to BLX. SH-BLX_sequential took 1.6 seconds more than BLX. Similarly, the additional runtime for SH-SPX_random and SH-SPX_sequential to SPX were 3.9 seconds and 3.9 seconds, respectively. ese numbers demonstrate the additional runtime only occupies a small part of the total processing time. ese additional computational costs mainly occur in the clustering with archive data and the label assignment with candidate offspring. e cost can be further reduced by fusing efficient distance measure or parallel computing. For a fixed size of an archive, the runtime grows linearly with the increase in the number of generations. Considering the complexity of the fitness function and the budget, SHX is a practical alternative to other crossover models.

Conclusions
In this paper, we have proposed a novel crossover model (SHX) which is simple yet effective and efficient. It can be easily integrated with any existing crossover operators. e key idea is to exploit search history over generations to gain useful information for generating offspring. Experimental results demonstrate that our SHX can significantly boost the performance of existing crossovers, in terms of the final solution and the convergence speed. Also, according to experiments, the induced extra runtime is negligible compared to the total processing time. SHX still has a few limitations. (1) Additional hyperparameters need to be determined. (2) e induced additional runtime may be unable to sufficiently support applications which require high processing speed. As the future work, we would like to address the above limitations. For instance, hyperparameters can be adaptively set by considering specific contexts, and parallelization can be introduced to speed up SHX.

Data Availability
e test data used to support the findings of this study are included within the article.

Disclosure
A preliminary version of this work appears in GECCO2020 and has also been mentioned in the manuscript which can be viewed at the following link: https://arxiv.org/abs/2003. 13508.

Conflicts of Interest
e authors declare that they have no conflicts of interest.