A thorough study on genetic algorithms in feedback-based wavefront shaping

Feedback-based wavefront shaping focuses light through scattering media by employing phase optimization algorithms. Genetic algorithms (GAs), inspired by the process of natural selection, are well suited for phase optimization in wavefront shaping problems. In 2012, Conkey et al. ̄rst introduced a GA into feedback-based wavefront shaping to ̄nd the optimum phase map. Since then, due to its superior performance in noisy environment, the GA has been widely adopted by lots of implementations. However, there have been limited studies discussing and optimizing the detailed procedures of the GA. To ̄ll this blank, in this study, we performed a thorough study on the performance of the GA for focusing light through scattering media. Using numerical tools, we evaluated certain procedures that can be potentially improved and provided guidance on how to choose certain parameters appropriately. This study is bene ̄cial in improving the performance of wavefront shaping systems with GAs.


Introduction
Optical focusing is essentially important in many applications of biophotonics, such as optical imaging, laser surgery, phototherapy, and optical manipulation. However, microscopic inhomogeneity inherent to biological tissue causes optical scattering, which prevents optical focusing from being achieved beyond 1 mm in soft tissue. 1,2 These scattering e®ects have been long considered as the major obstacles that limit all these applications to a shallow depth.
To overcome this challenge, wavefront shaping technology has been developed with the capability to control and refocus scattered light. By treating optical scattering as a deterministic process, wavefront shaping technology¯nds a special wavefront, called optimum wavefront, to compensate scattering e®ects. There are three general approaches to obtain the optimum wavefront: optical phase conjugation (OPC), [3][4][5][6][7][8][9] the inversion of transmission matrix (TM), [10][11][12][13][14] and feedback-based wavefront shaping. [15][16][17][18] Compared to the other two approaches, feedback-based wavefront shaping employs the simplest implementation. In a typical experiment, incident light is¯rst modulated by a spatial light modulator (SLM) at the front side of a scattering medium. The modulated light is then directed by the scattering medium. At the backside of the scattering medium, a camera or a photodiode measures the light intensity at a targeted location, providing feedback signals. To form an optical focus at the targeted position through the scattering medium, the phase map displayed on the SLM is iteratively optimized to maximize the feedback signal.
In feedback-based wavefront shaping, various algorithms have been proposed and demonstrated to search for the optimum phase map, including a stepwise sequential algorithm (SSA), 19 a continuous sequential algorithm (CSA), 20 a partitioning algorithm (PA), 20 a simulated annealing algorithm (SA), 21 a genetic algorithm (GA), 18,22,23 a particle swarm, 22,24 a four-element division, 25 a harmony search algorithms 26 and a Hadamard encoding algorithm (HEA). 27 Each algorithm has its own pros and cons, which has been well discussed in many literatures. 22,28,29 Among all these algorithms, the GA has been recognized to be particularly advantageous in noisy environment. As the name suggests, the GA is an iterative optimization algorithm that originates from Darwin's theory of evolution and follows the basic scheme of \survival of thē ttest". The GA has been widely used in solving various complex problems through iterative¯tting, leading to an optimum individual with the highest tness. In 2012, Conkey et al.¯rst introduced the GA into feedback-based wavefront shaping to focus light through scattering media. In that work, the crossover procedure was slightly modi¯ed by bringing in a random binary mask breeding system, which exhibits a higher convergence rate compared to the conventional GA. Due to its superior performance in noisy environment, the GA has been widely adopted. The detailed procedures, as well as the choice of parameters, are largely followed by successors. However, till to date, no studies have been reported on optimizing neither the detailed procedures nor the choices of parameters of the GA. To¯ll this blank, in this study, we perform a thorough study on optimizing the performance of the GA. In particular, we propose several new schemes to form new generations and select parents. These new schemes can facilitate the initial increasing rate of the enhancement. We also investigate the choices of certain parameters in the GA, including the parental mask ratio and the mutation rate. By comparing the performance of the GA with various combinations of parameters, we provide certain guidance on how to choose these parameters appropriately under di®erent noise levels.

Genetic Algorithm
The°owchart of the GA used in feedback-based wavefront shaping is schematically shown in Fig. 1, which can be divided into eight di®erent steps. The GA begins by generating an initial population of NP phase masks. The¯tness of each phase mask is evaluated as the light intensity at the targeted position. Then, two parents, pa and ma, are selected from the population to breed o®spring. There are several possible selection schemes, but the rule of thumb is that the individual with a higher¯tness has a higher probability to be selected. With two parents being selected, a new o®spring is created as pa Á T þ ma Á ð1 À T Þ, where T is a random binary template generated with a parental mask ratio pc. The way of generating o®spring indicates that information is inherited from two parents. For this o®spring, a mutation operation is then performed. The mutation rate pm can be either a¯xed number or an adaptive variable. After mutation, the¯tness of the new o®spring is measured and recorded. The above steps of generating new o®spring are repeated with a certain number of times, until a generational condition is ful¯lled, which means that the new generation is complete. The GA is usually executed for many generations, and this generational process is repeated for a given number of times. Among all these steps, at least a few of them (steps 3, 4, 6, and 8), which are enclosed in the purple boxes, need further discussion and can be potentially improved. For example, in step 3, Ref. 18 selects only two parents based on the roulette algorithm without providing further explanations. Similarly, in step 8, Ref. 18 directly supplements the new generation with NP/2 new o®spring, regardless of their¯tness. Given that there exist multiple schemes for both steps, we should at least implement some of them and compare their performances. Moreover, in step 4, the random binary template T in Ref. 18 is generated using a¯xed parental mask ratio pc = 0.5. However, the reason of this choice has never been explained. Similarly, in step 6, the choices of the parameters in the mutation operation are not explained in Ref. 18 as well. As a result, we will test di®erent choices of these parameters in the GA and provide certain guidance on how to properly choose them.

Simulation Environment
Throughout this study, numerical tools are used to evaluate the performance of this algorithm. In this section, we describe the simulation environment used in this study. Unless otherwise stated, the following parameters will be used throughout the entire study. For each phase mask, the number of independent elements is 16 Â 16 ¼ 256, which is also the number of genes in the GA. To mimic a phase-only modulation scheme, all the elements have the same amplitude of 1. For the initialization process, the phase of each element in the phase mask is randomly distributed between 0 and 2. Using random matrix theory, the scattering medium is modeled as a TM with a dimension of 256 Â 256. Each element of this matrix is drawn from a circular Gaussian distribution. 30 The scattered light is then computed by acting the TM on the input light¯eld (the phase mask). During simulation, we have properly normalized the incident light¯eld and the TM, so that the average intensities for both the incident light and the scattered light equal 1. Therefore, the enhancement, de¯ned as the ratio between the light intensity at the targeted position and the initial average intensity, can be calculated by simply taking the absolute square of the target element in the scattered light. In this study, the element in the central¯led of the scattered light is chosen as the target for simplicity.
For the GA, the¯tness of a given input phase mask is the same as the enhancement de¯ned above. The population of each generation is¯xed at NP ¼ 50. The generational process is repeated until the number of measurements n reaches 25,000. For simplicity, when discussing one speci¯c parameter, other parameters are set to be the same as the ones used in Ref. 18. In the mutation operation, the initial mutation rate pm max, the¯nal mutation rate pm min, and the decay factor are set to be 0.1, 0.0025, and 200, respectively. In the breeding operation, the parental mask ratio pc is set to be 0.5. Since the GA is particularly advantagious in noisy environment, we follow the same strategy in Ref. 18 to add white Gaussian noise. Several levels of Gaussian noises with di®erent standard deviations 0:1hI 0 i, 0:3hI 0 i, and 0:6hI 0 i, representing lownoise level, medium-noise level, and high-noise level, were added into each measurement. Here, hI 0 i is the average intensity of the scattered light, which equals 1 due to normalization. In a typical experiment, the noise of the measurement may come from the instability of the laser and the inaccuracy of the SLM. We also note that linear interpolation was used to make plots smooth.
The numerical simulations were carried out on a personal computer with a central processing unit of Intel(R) Core(TM) i5-8600K CPU @ 3.60 GHz and a random access memory of 16.0 GB. With this computational resource, it takes about 7 s for each independent execution.
Due to statistical reasons, even for the same set of optimization parameters, the enhancement shows considerable°uctuations. To minimize these statistical errors and make reliable conclusions, each plot shown below is obtained by averaging the results from 1000 independent executions. For example, the standard deviation of the enhancement around 5000 measurements is about 10. By averaging the results for 1000 independent executions, the standard error, which estimates the deviation, reduces to only 0.3.

Results and Discussions 4.1. Schemes to form new generations (step 8)
We¯rst discuss the scheme to form new generations, which turns out to have considerable e®ects on the performance of the GA. In Ref. 18, based on the rankings of¯tness, the prior half of the individuals are added into the population of the next generation, while the latter half (NP/2) of the individuals in the current generation are dropped. Then, we generate NP/2 o®spring and supplement them into the population of the next generation, regardless of their¯tness. In this way, the new generation contains NP/2 individuals from the current generation and NP/2 o®spring. This scheme is adopted in Ref. 18, and is labeled as scheme 1. For scheme 1, it is likely that some of the newly added o®spring may even have a lower¯tness compared to those dropped ones, which may slow down the increasing rate of the enhancement. Since the enhancement increases monotonically during the optimization process from the theoretical perspective, we can consider other schemes that are slightly aggressive. One suggested modi¯cation is that when dropping the latter half of the individuals in the current generation, we record the largest¯tness of them as a threshold. When a new o®spring is created, we immediately compare its¯tness to the threshold. The new o®spring can be added into the population of the next generation only if its¯tness is larger than the threshold. In this way, we guarantee that all the dropped individuals are replaced by o®spring that have larger¯tness. This scheme is labeled as scheme 2. Another possible scheme is that we only generate NP/2 o®spring and combine them with the current generation that has NP individuals. Then, we rank all of them based on the¯tness and drop the last NP/2 individuals. This scheme is labeled as scheme 3. The comparison of three schemes are shown in Fig. 2. It can be directly seen that the increasing rates at the very beginning for all three schemes are almost the same, regardless of noise levels. Observable di®erences start to appear after 3000 measurements and become quite prominent around 7000 measurements. For the noise-free case shown in Fig. 2(a), scheme 1 has the lowest enhancement throughout the entire plot. Such an observation is in consistent with our speculation that scheme 1 is the most conservative one. Moreover, scheme 3 always outperforms scheme 2, possibly because under the same number of measurements, scheme 3 has a greater number of generations than scheme 2. This observation also indicates that it might be better to proceed to the next generation rather than spending lots of measurements within a certain generation.
When noises are added, Figs. 2(b)-2(d) show the comparisons of these three schemes. Although scheme 3 still outperforms the other two schemes, scheme 1 gradually exhibits its advantages in handling noises. With low-and medium-noise levels, although the enhancement obtained using scheme 1 is still lower than that of the other two schemes at around 7000 measurements, the gaps become smaller compared to the noise-free case. Moreover, at around 25,000 measurements, the enhancement obtained using schemes 1 and 3 becomes close and nondi®erentiable. When the noise level becomes high, as shown in Fig. 2(d), the enhancement obtained using scheme 1 has already become larger than the one obtained using scheme 2 at around 7000 measurements. Although scheme 1 still performs worse than scheme 3 at around 7000 measurements, the enhancement obtained using these two schemes at around 25,000 measurements is very close.
These results demonstrate that in a noisy environment, an aggressive scheme can increase the initial increasing rate at the expense of the¯nal convergence rate. In the applications of biophotonics and¯ber optics, our system does not have enough time to make too many measurements due to the relatively short correlation time of the scattering events. Therefore, a fast initial increase rate is preferable in the GA. For this reason, we consider scheme 3 as the most appropriate one for the GA, which will be adopted in the following simulations.

Schemes to select parents (step 3)
The GA used in Ref. 18 employs a roulette algorithm to choose two individuals from the population as two parents pa and ma. For these two parents, the one with a higher¯tness becomes pa and the other one becomes ma. With this scheme, it is possible that both parents have low¯tness, thereby breeding low-¯tness o®spring that is unlikely to be added into the next generation. In order to avoid the above-mentioned situation, we propose to select several individuals using the roulette algorithm. For convenience, we introduce a variable \selection number" S n to represent the number of selected individuals. After selecting S n individuals, the ones with the highest two¯tness are selected as parents. This method of selecting pa and ma using S n > 2 can be considered as a nonlinear roulette algorithm, where high-¯tness individuals are a lot easier to be selected. S n ¼ 2 corresponds the original scheme and S n ! 1 means directly taking the two individuals with the highest two¯tness in the generation. Figure 3 shows the performance of the GA using di®erent S n under di®erent noise conditions. For noise-free case, S n ¼ 2 shows the worst performance regarding both the initial increase rate and the¯nal convergence rate. S n ¼ 30 has the best performance but the enhancement curve is close to that of S n ¼ 20. This observation indicates that further increase of S n will not cause a big improvement. Nevertheless, one would expect that S n ! 1 should have the best performance. Surprisingly, it is shown that S n ! 1 performs even worse than S n ¼ 10. This counterintuitive e®ect may result from the \genetic drift", which corresponds to nonergodic in nature. In this situation, the increase of the enhancement is largely owing to mutations. When di®erent levels of noises are added, Figs. 3(b)-3(d) show the performance of the GA with various S n . For low-and medium-noise levels, S n ¼ 2 always has a relatively slow initial increase rate but gradually surpasses other situations in the end. This observation also indicates a counterbalance between the initial increasing rate and the¯nal convergence rate. When the noise level becomes high, S n ¼ 2 outperforms other situations even in a very early stage. In general, we conclude that in a noisy environment, a larger S n leads to a faster initial increasing rate but a slower¯nal convergence rate. Moreover, the the noise level is higher, the earlier S n ¼ 2 surpasses other schemes.
Based on the above analysis, we propose to use an adaptive S n , which is a large number in the beginning and gradually decrease to 2 in the end. Such an adaptive S n is expected to achieve a fast initial increasing rate and a fast¯nal convergence rate simultaneously. Here, we use the following formula: S n ¼ ðS n ini À 2Þ Â expðÀ Â n=NP Þ þ 2, where S n ini is the initial selection number and is a constant number. Numerically, we found that GA performs quite well when S n ini ¼ 15 and ¼ 125. Figure 4 shows the comparison of S n = 2, S n = 30, and the adaptive S n , under di®erent noise levels. As can be seen from the¯gures that the adaptive S n performs well throughout almost the entire measurement range under di®erent noise levels. The only unsatisfactory performance of adaptive S n shows up in Fig. 4(d), possibly due to the fact that the parameters in the adaptive S n were not optimized to the best condition. Further optimizations on the adaptive S n can be performed by scanning di®erent combinations of the parameters including S n ini and or considering other transition formulas.

Choices of the parental mask ratio pc (step 4)
A key parameter in generating the random binary template T is the parental mask ratio pc. During generation, each element of T will be randomly assigned to either \1" or \0", with a probability of pc and (1pc), respectively. A new o®spring is generated according to the following formula: 18, pc is xed to 0.5, which means that the o®spring inherits equal information from pa and ma. Given that pa always has a higher¯tness compared with ma, one would expect that more information should be inherited from pa. To validate this assumption, we consider several di®erent choices of pc ranging from 0.5 to 0.65 with a step size of 0.05. Besides using a¯xed pc, we also consider a tunable pc that is determined as the¯tness ratio of the parents: pc = pa¯tness/(pa¯tness + ma¯tness), where-pa¯tness and ma¯tness are the¯tnesses of the selected pa and ma. Figure 5 shows the performance of the GA with di®erent choices of pc. For all noise levels, we found that pc = 0.5 always shows the best performance but is only slightly better compared to other choices. Therefore, we conclude that varying the parental mask ratio pc has little e®ect on the performance of the GA. This result is actually in consistent with the Darwin's theory of evolution, where the chromosomes of o®spring equally inherit from their parents (pc = 0.5).

Choices of the mutation rate pm (step 6)
Mutation is a very important operation in the GA to maintain genetic diversity as well as to jump out of a local maximum. In general, the GA with a very small mutation rate may su®er from \genetic drift" while a very large mutation rate can lead to the loss of good solutions. Given the nature of wavefront shaping problems, Ref. 18 used an adaptive mutation rate that is high in the beginning and gradually vanishes in the end: pm ¼ ðpm maxÀ pm minÞ Â expðÀn=Þ þ pm min. To¯nd reasonable settings, it is worth tuning these parameters. However, a detailed analysis on how to choose these parameters has never been discussed. To¯ll this blank, we perform an ergodic scanning on the parameters of pm max, pm min, and . We , which leads to a total number of 240 optional combinations. To save computational resources, we decrease the number of average times from 1000 to 100 during this ergodic search.
As a typical example, Fig. 6 plots the enhancement as a function of the number of measurements under the high-noise level. As can be seen from the Fig. 6(a) that di®erent combinations of these parameters show signi¯cant variations, which indicates that choices of these parameters play signi¯cant roles in the performance of the GA. To quantify the performance, we de¯ne a quantity: F ðNÞ P n N n¼1 ðnÞ. Pictorially, F ðNÞ is the area enclosed by the enhancement curve and the horizontal axis before N measurement. In general, a larger area indicates a better performance. Since the most rapid increase happens before 5000 measurements, we choose F (5000) as the¯gure of merit.
For all the 240 possible combinations, it is found that for high-noise level, the combination of pm max = 0.1, pm min = 0.01, and ¼ 350 leads to the largest F (5000). For medium-noise level, the combination of pm max = 0.1, pm min = 0.005, and ¼ 450 leads to the largest F (5000). For lownoise level, the combination of pm max = 0.09, pm min = 0.0025, and ¼ 650 leads to the largest F (5000). From the above results, we note that pm max is always around 0.1, regardless of the noise level. This value may be related to certain parameters that were¯xed during this simulation, such as the number of independent controls (number of genes) and the population number. Moreover, as the operation of mutation is helpful in dealing with environmental noises, a relatively larger pm min is helpful when a higher noise level exists. For the same reason, a higher noise level requires a slower decay of mutation rates, which leads to a smaller .
To further illustrate the functions of each parameter separately, we compare the performance of the GA by varying only one parameter at a time. At high-noise level, pm max = 0.1, pm min = 0.01, and ¼ 350 are chosen as the baseline. Figure 6(b) shows the performance of the GA when only pm max is varied. As can be seen from the¯gure that pm max signi¯cantly a®ects the early increasing rate of the enhancement but has little e®ects on the¯nal convergence rate. Among all di®erent choices of pm max, the best performance is obtained when pm max = 0.1. In contrast, as shown in Fig. 6(c), pm min does not have signi¯cant e®ect on the¯rst few thousands measurements. However, an improper choice of pm min signi¯cantly degrades the enhancement after 5000 measurements. Among all di®erent choices of pm min, the best performance is obtained when pm min = 0.01. Figure 6(d) shows the performance of the GA with di®erent . Although ¼ 350 is found to have the largest F (5000), there is no big di®erence among the curves obtained with di®erent . Therefore, we conclude that the performance of the GA is not sensitive to the choice of .

Con°ict of Interest
We declare that there is no con°icts of interest in this paper.

Conclusion
In conclusion, we have done a thorough study on the performance of the GA in feedback-based wavefront shaping. Using numerical tools, we investigate the performance of the GA with several di®erent schemes, including the formation of new generations and the selection of parents. With new schemes, the initial increasing rate of the enhancement can be improved. Moreover, we also discuss di®erent choices of the parental mask ratio pc and the mutation rate pm. Guidance on how to properly choose these parameters is provided. We anticipate that this work can be bene¯cial to future studies related to feedback-based wavefront shaping with GAs.