Time optimal entrainment control for circadian rhythm

The circadian rhythm functions as a master clock that regulates many physiological processes in humans including sleep, metabolism, hormone secretion, and neurobehavioral processes. Disruption of the circadian rhythm is known to have negative impacts on health. Light is the strongest circadian stimulus that can be used to regulate the circadian phase. In this paper, we consider the mathematical problem of time-optimal circadian (re)entrainment, i.e., computing the lighting schedule to drive a misaligned circadian phase to a reference circadian phase as quickly as possible. We represent the dynamics of the circadian rhythm using the Jewett-Forger-Kronauer (JFK) model, which is a third-order nonlinear differential equation. The time-optimal circadian entrainment problem has been previously solved in settings that involve either a reduced-order JFK model or open-loop optimal solutions. In this paper, we present (1) a general solution for the time-optimal control problem of fastest entrainment that can be applied to the full order JFK model (2) an evaluation of the impacts of model reduction on the solutions of the time-optimal control problem, and (3) optimal feedback control laws for fastest entrainment for the full order Kronauer model and evaluate their robustness under some modeling errors.


Introduction
The circadian rhythm is a mechanism with which living beings can synchronize their biological processes with the light and dark pattern of the terrestrial day [1]. Effectively, the circadian rhythm functions as a master clock that regulates these processes [2]. In humans, the circadian rhythm is heavily linked to various physiological processes, including sleep, metabolism, hormone secretion, and neurobehavioral processes. Disruption of the circadian rhythm is known to have negative impacts on health, ranging from fatigue in travelers with jet-lag to an increased risk of cancer in rotating shift workers [3].
Light is the strongest circadian stimulus. In the literature, there are mathematical models that capture the dynamics of the circadian rhythm and how light affects it. The most detailed models are based on the biochemical and gene regulation processes behind the circadian PLOS  rhythms, such as those in [4][5][6]. Empirical models, such as variants of the well-known Jewett-Forger-Kronauer (JFK) model [7,8], are simpler and capture the essential behavior of the human core body temperature oscillation and the effect of light on the phase and amplitude of this oscillation. As demonstrated in [9], the JFK model may be considered as the asymptotic case of the biochemical models in an averaged sense. One aspect of circadian rhythm regulation that has received a lot of attention is the (re) entrainment problem, i.e., the problem of driving a misaligned circadian phase to a reference circadian phase. Such problems occur, for example, in travelers with jet-lag or workers with rotating shifts. This problem is typically expressed as an optimal control problem of a system with nonlinear dynamics. The control inputs into the system are typically light [10] and chemicals that target circadian genes [11]. Some researchers have proposed to use model predictive control to deal with the nonlinear dynamics of the circadian rhythm without any guarantee of optimality of the solutions [12][13][14][15]. In contrast, others consider the time-optimal control problem related to circadian entrainment [16]. Our prior work (c.f. [17][18][19][20][21]) that used the Pontryagin Maximum Principle approach falls under this category. A related work reported in [22] also posed the time-optimal control problem and solved it by assuming that the optimal light schedule would alternate between using maximally bright circadian light and darkness. In the optimal control literature, such strategies are called bang-off strategies. Subsequently, the timeoptimal scheduled is computed by optimizing the switching times of the light (i.e., between light on and off).
In this paper, we solve the time-optimal control problem of the fastest entrainment on the JFK model. The JFK model for human circadian rhythm is a third-order nonlinear differential equation, which is detailed in Sec. 2. In our prior work [18,19], we study the optimal entrainment of a reduced model with second-order dynamics, which is obtained by ignoring a (fast) part of the dynamics, which is called the Process-L. In some later work [20,21], we study the optimal entrainment of a further reduced model, which is obtained by ignoring the amplitude of the circadian oscillation and focusing on the phase dynamics of the oscillation. These reductions were necessary to solve the time-optimal control problem; otherwise, the solution procedures that we used were not numerically stable. The contributions of this paper can be stated as follows.
• We formulate a general solution for the time-optimal control problem of fastest entrainment that can be applied to the full order JFK model.
• We evaluate the impacts of the ignored dynamics in the reduced-order models on the solutions of the time-optimal control problem.
• We formulate optimal feedback control laws for fastest entrainment for the full order JFK model and evaluate their robustness under modeling error.
The main tool that we use to achieve these results are (i) the lower order models that allow us to compute the solutions of similar (but simpler) problems as initial approximations of the optimal solution, and (ii) the calculus of variations that allows us to formulate a functional gradient descent algorithm to minimize the objective (i.e., entrainment time) from the initial approximations above. Note that the algorithms which are proposed to calculate the (locally or globally) optimal light input for the minimum-time entrainment only depend on the known initial states (open-loop form) instead of the current circadian states during the whole entrainment time (feedback form). If the given values of the initial states are not accurate or disturbances in the light inputs or circadian states occur during the entrainment processes, the minimum-time optimal light inputs which are given as a function of time might turn out to be invalid for entrainment. The feedback implementation of the optimal light input becomes necessary for robust entrainment. To implement the feedback entrainment, we collect data from the computed open-loop optimal trajectories to learn an optimal feedback control strategy. In contrast, existing results to this problem, e.g., those reported in [22], only report on the procedure to obtain the open-loop optimal trajectories. We also demonstrate the robustness of the optimal feedback control strategy by applying it on problem setups that are not used in the training data set.
The rest of the paper is organized as follows. Section 2 covers the full order model and the definition of the time-optimal control problem. In Section 3, we define the reduced-order models and the corresponding time-optimal control problems. In Section 4, we present the algorithms that we use to solve the time-optimal control problems above. In Section 5, we evaluate the impacts of the ignored dynamics in the reduced-order models on the solutions of the time-optimal control problem. In Section 6, we present the optimal feedback control laws for the time-optimal control problem. Finally, Section 7 concludes the paper with some discussion on the main results.

Problem formulation
The circadian rhythm entrainment problem in this paper is studied based on the JFK model [7]. This model is proposed on the dynamics and light-induced variations in the core body temperature (CBT), comprised with two processes: (1) the Process-L simulates the transduction from the light energy received by the retina to the neuron stimulus transmitted by the retina. It is also closely linked to the light adaption process of our eyes; (2) the Process-P simulates when the SCN receives the stimulus from the retina, how does the CBT change with this light-induced stimulus. The dynamics of the CBT oscillator is formulated and normalized on the Van Der Pol limit cycle. The dynamics of this circadian rhythm model is expressed as the following ordinary differential equations: dn dt ¼ 60ðaðIÞð1 À nÞ À bnÞ; ð3Þ Here x(t) and x c (t) are the states of the CBT oscillator, and n(t) is the state of the process that represents retinal photoreceptor saturation (termed Process-L). The states x(t) and n(t) are normalized and do not have physical units. The unit of x c (t) is h −1 . Light input that enters the receptor is represented by its intensity I(t), which drives the Process-L. The signal u(t) is the input to the circadian oscillator, downstream from the Process-L. The values of the parameters of this model are reported in Table 1 below.
We define a 24 h-periodic light input simply simulating the natural light-dark cycle, which is treated as the reference light in the following entrainment processes. The reference state trajectories in this model, denoted as � xðtÞ, � x c ðtÞ and � nðtÞ, respectively, are the stable limit cycle solution of the JFK model with the periodic input I w (t). The time-optimal entrainment (TOE) problem is formulated as finding the light input I(t) that drives a given initial state to this periodic solution as quickly as possible. The jet-lag caused by rapid transmeridian travel is considered in this paper. We basically assume that the circadian rhythm of the entraining subject keeps synchronization with the local time in the starting point during the travel time, formally:

PROBLEM (TOE)
Given the system dynamics (1)-(6) and initial conditions where T lag 2 (0, 24) is the amount of jet-lag, i.e., time shift between the destination and starting point of travel. We want to minimize T such that xðTÞ x c ðTÞ using I(t) as the optimization variable, where tol 1 = 0.01 corresponds to about 30 minutes difference in the circadian phase and is small enough to ignore. We assume that, when the entraining state trajectory reaches the reference one (achieves the stopping criterion), the reference light in (7) is applied on the subject. Thus, the entrainment light strategy in this model during the whole time is given asÎ ( where I � (t) represents the optimal light input of the TOE problem. As ½� x; � x c � is the stable periodic solution of the dynamics with the input I w , the reference light I w will drive the entrainment state [x, x c ] to the reference state ½� x; � x c � gradually if they are close with each other. Note that n is introduced in the Process-L of the JFK model to simulate the nonlinear relation of the light input and circadian drive received by the SCN. It has a very fast time scale and very small effects on the stability of the limit cycle, which is shown in the following section. The full order model behaves just like the 2nd-order one. Therefore, n is ignored in the terminal condition (9). We also impose the constraint for all t 2 [0, T], where I max is a parameter representing the maximum light intensity that can be used. Throughout the paper, we define the circadian phase(in radian) of any point on the limit cycle of the circadian oscillation [x, x c , n] as

Model simplification
To reduce the complexity of solving the optimization problem above, some simpler models have been introduced. The optimization problem was then reformulated for the simplified models. In this section, we briefly review the simplified models and compare their behaviors under the periodic light input in Eq (7).

Second-order model
In this model, we exploit the fact that the Process-L (3) has a much faster time scale than the circadian oscillator. We use quasi steady-state approximation to assume that the Process-L is always at its steady-state. This yields n ¼ aðIÞ Further, from (5) we have Since u is related to I through a static nonlinear mapping (13), we can simplify the circadian oscillation model by assuming that u is the control input and removing the Process-L from the model. The resulting model is therefore Further, we map the periodic input in (7) through (13) and obtain a 24 h-periodic circadian input The reference state trajectories in this part, denoted asxðtÞ andx c ðtÞ, respectively, are the stable limit cycle solution of the 2nd-order model with the periodic input U w (t).
For the 2nd-order model, we can define a surrogate for the time optimal entrainment problem (TOE) as follows: PROBLEM (TOE-2nd Order).
using u(t) as the optimization variable. We also impose the constraint for all t 2 [0, T], where u max is a parameter representing the maximum circadian input that can be used.

First-order model
Exploiting the fact that the 2nd-order circadian model has a stable limit cycle, we can further reduce the 2nd-order model to one that basically only captures the dynamics of the phase of the oscillation, and thus ignores the amplitude of the oscillation. The first-order model is of the form where θ (in radians) is the circadian phase. The parameter ω 0 is the so-called free running frequency, whose value ω 0 = 2π/24.2 rad/h is chosen to match the frequency of the limit cycle (with u(t) � 0) in the second-order model (14)- (16). The input u is the circadian input with the same interpretation as in (13). The function f(θ) is called the phase response function (or phase response curve) in the literature [16,20,23,24]. Based on (21), we can interpret this function as a map from the timing of the introduction of an impulsive input to the resulting shift in the circadian phase. We compute the phase response function f(θ) with the following procedure.
Step 1. Obtain the free running periodic solution to (14)-(16) by setting u(t) � 0. Denote this solution as ðx f ðtÞ;x cf ðtÞÞ and the period as T ω . We assume the timing convention occurs atx fc ð0Þ ¼ 0 andx f ð0Þ is at maximum.
Step 2. Choose different values of τ 2 [0, T ω ], run the model in (14)- (16) with initial conditions xð0Þ ¼x f ðtÞ and x c ð0Þ ¼x cf ðtÞ with the impulsive input signal The corresponding impulsing phase is defined as where u max = 0.2208 corresponds to I max = 10000 lux. Here, Δ is a much shorter time interval than 24 hours. In our implementation, we choose Δ = 0.5 hour.
Step 3. Since (x, x c ) will converge back to the free running periodic orbit, we define the resulting time shift T θ (modulo the period T ω ) such that Then, the phase response function is given by The resulting PRC is demonstrated as Fig 1, which is used for the formulation of the 1st-order model in (21). Forcing the first-order model in Eq (21) with the periodic circadian input U w (t) from Eq (17) results in a periodic limit cycle (modulo 2π), where the circadian phase trajectory is denoted as � yðtÞ. For the first-order model, we can define a surrogate for the time-optimal entrainment problem (TOE) as follows: PROBLEM (TOE-1st order). Given the system dynamics (21) with an initial condition minimize T such that using u(t) as the optimization variable, where tol 2 = 0.1 also corresponds to about 30 minutes difference in the circadian phase. We also impose the constraint for all t 2 [0, T], where u max is a parameter representing the maximum circadian input that can be used.

Behaviors under periodic light inputs
Under the periodic inputs I w (t) (for the full order model) and U w (t) (for the reduced models), the system exhibits stable limit cycles. Fig 2 shows a comparison between the periodic orbits of these models for I w,max = 100 lux (comparable to the light intensity in a dim indoor space such as office corridors or elevators [25]), and I w,max = 10000 lux (comparable to the light intensity outdoor on a bright day). The corresponding values of U w,max (through (13)) are 0.1028 and 0.2208, respectively. We can make the following observations from these simulation data: • The periodic orbits of the full order model and the second-order model are practically the same.
• There is a relative phase shift between these models, marked by the circadian phases at sunrise. At I w,max = 100 lux, the largest gap is 0.0974 rad (22.31 min) between the first-order and second-order models. At I w,max = 10000 lux, the largest gap is 0.1534 rad (35.15 min) between the second-order and the full order model.
• The impact of the Process-L is more significant at the higher circadian light intensity. This can be observed by comparing the waveforms of the circadian input u(t) in the full order and second-order models. At I w,max = 10000 lux, u(t) has more pronounced spikes when the light is switched on, compared to at I w,max = 100 lux.

Solution strategies for the time optimal entrainment problems
Solution strategies for TOE-2nd order and TOE-1st order have been reported in [19,20], respectively. For completeness, we summarize the strategies in this section. First, note that all system models are single input affine nonlinear systems. That is, they can be written in the following format where x 2 R n is the system state, and u 2 R is the control input. We assume that the control input is bounded, Further, the time optimal entrainment (TOE) problem can be cast as a time optimal control problem with terminal state constraint, which can be formulated as minimizing the objective  subject to the initial and terminal state constraints ZðxðTÞ; TÞ ¼ 0: The function Z : R n � R ! R can be used to represent the fact that the terminal state must be at a time-varying target state (i.e., the reference). In terms of the entrainment problem of the three models mentioned in this paper, we define the expression of η = 0 based on (9), (19) and (27), respectively.
Following the Pontryagin's Maximum Principle, we formulate the augmented cost function where lðtÞ 2 R n is the co-state and κ is the Lagrange multiplier corresponding to the terminal state constraint. The first variation of the objective J with respect to the input signal υ(t) is given by where δυ(t) is the perturbation of υ(t). The optimal control input can be found by where ξ � (t) and λ � (t) are optimal state and co-state trajectories, respectively. We can see that the optimal control input therefore necessarily follows a bang-off strategy The evolution of the state follows the model in (29), while the co-state evolves according to Further, λ � (t) needs to satisfy its terminal condition, which is also known as the transversality condition [26,27]. Here, T � and κ � denote the optimal time and Lagrange multiplier, respectively. To solve the optimal control problem above, we use two algorithms, which are explained below along with their strengths and weaknesses.

Direct shooting algorithm (DSA)
In this algorithm, we view the two point boundary value problem above as finding the appropriate initial co-state λ � (0). Observe that if (ξ � (t), υ � (t), λ � (t)) satisfy (29), (37), and (38), then so do (ξ � (t), υ � (t), cλ � (t)) for any positive scalar c. The right scalar c has to be chosen to satisfy the transversality condition (39). Therefore, the search space for λ � (0) can be reduced from R n to the unit sphere in R n . It turns out that for TOE-2nd Order and TOE-1st Order, we can solve the two point boundary value problem rather efficiently. For TOE-2nd Order, this means λ � (0) can be sought on a circle. For TOE-1st Order, there are only two candidates for λ � (0). The basic direct shooting algorithm can then be expressed as follows: Step 1. Create N sample points on the unit sphere in R n . Denote them as λ 1 , � � �, λ N .
Step 2. For each of i 2 {1, � � �, N} do: simulate ξ(t) and λ(t) forward using (29), (37), and (38), under initial conditions ξ(0) = ξ init and λ(0) = λ i . Terminate the simulation when the final state constraint is satisfied, or when t reaches an upper bound T max . The upper bound T max can be initially set based on the time needed for open-loop entrainment, and subsequently reduced as shorter entrainment times are found.
Step 3. Several locally optimal solutions can be found using the direct shooting method by searching abundant guesses of the initial co-state values. The optimal solution with the minimum entrainment time among all found solutions is treated as the globally optimal one, i.e., the optimal entrainment time is Note that (29), (37), (38), and (39) provide us with a necessary condition for local optimality. This direct shooting algorithm provides us with a means to search (through sampling) for all solutions that satisfy this condition and therefore has the advantage of not getting trapped in local minima. The weakness of this algorithm is that for higher-order systems, such as the full order model, the number of samples generated can be impractically large. Another weakness is the co-state dynamics in (38) is unstable for the second-order and the full order models. This means forward numerical integration of this dynamics is unreliable for long periods of time, e.g., in jet-lag cases with long entrainment time.
Representatives of the optimization results using the DSA for a jet-lag case, where a subject whose initial circadian phase is at 6 am (corresponding to θ(0) = 0.8075 in TOE-1st Order) seeks to minimize her entrainment time for a 12-hour jet-lag are shown in Fig 3. We assume that u max = 0.1028, u max = 0.1731 and u max = 0.2208 (corresponding to steady circadian lighting levels of 100 lux, 1000 lux and 10000 lux, respectively). Applying the same algorithm for the corresponding TOE-2nd Order case (corresponding to x(0) = 0.8033 and x c (0) = −0.6905) leads to a success for the bright circadian light cases (u max = 0.2208 and 0.1731), as shown in Fig 4. For dim circadian light (u max = 0.1028), the DSA does not converge because of the instability of numerical forward integration of the co-state dynamics in (38). This case is illustrated in Fig 5, where we can observe that even with the same initial co-state value at t = 0, λ 1 and λ 2 from the numerical forward integration diverge from the backward integration values at about 180 hours. As the optimal entrainment times with I max = 1000 and 10000 lux are always less than 150 hours, the direct shooting algorithm has no issues in the forward integration in these cases.

Gradient descent algorithm (GDA)
This algorithm addresses the weaknesses of the DSA. The idea is to use gradient descent search for the optimal input υ � (t) by using the first variation in (35). The basic gradient descent algorithm can then be expressed as follows: Step 1. Set k = 0. Choose an initial guess for control input u 0 : R þ ! ½0; u max �.
Step 3. Find the final condition for the co-state by solving for κ k and λ k (T k ).
Step 5. Compute the descent direction using (35), i.e., Step 6. Update the control input along the descent direction: u kþ1 ðtÞ ¼ min ðmaxð0; u k ðtÞ À a k D k ðtÞÞ; u max Þ: ð43Þ The step size α k > 0 should be chosen such that the entrainment time is improved or the same as the previous value (T k+1 � T k ). In our simulations, we use a line search (bisection search, etc.) to solve the optimization problem given as follows: Tð min ðmaxð0; u k ðtÞ À aD k ðtÞÞ; u max ÞÞ: This process would be time-consuming, but it guarantees that the updating step decreases the entrainment time as much as possible or has no effects on the entrainment time in every iteration.
Step 7. Increment k by 1. Repeat from Step 2 until convergence. In our numerical simulations, we set the stopping criterion for the gradient descent process as Z T kþ1 0 ju k ðtÞ À u kþ1 ðtÞjdt � 10 À 10 : The stopping criterion is used to verify that υ k (t) has reached or approximately reached the fixed point and the entrainment time cannot be improved. In most cases, the gradient descent algorithm takes at most 50 iterations to reach the stopping criterion mentioned above.
We can show that the following property characterizes fixed points of the iteration procedure above.
Note that convergence of the iteration essentially means we hit a locally optimal control input. While the locality is a drawback of this algorithm, it has an advantage over the DSA because it does not require forward simulation of (38) that is numerically unstable. We demonstrate the use of GDA in solving the 12-hour jet-lag case for TOE-2nd Order (corresponding We have shown in the previous section that DSA cannot be used to solve this problem. The TOE solution of this case is shown in the upper panels in Fig 6. Observe that since the co-states dynamics are integrated backward, we do not have the instability issue encountered in the DSA. Fig 6  also shows the TOE results of the 2nd-order model with I max = 1000 and 10000 lux. We can observe in these cases that the amplitude suppression occurs in the oscillator of the core body temperature. The amplitude suppression regions are usually called the phaseless set [28], which, under higher light intensities, gives a 'shortcut' for the circadian states to follow and shortens the entrainment time. We can also observe in the amplitude suppression cases that, in the first two days, the light inputs are almost centered around the minimum value of x, i.e., the minimum value of the core body temperature. These light strategies are very similar to those in the trials in [29], whose results also prove that amplitude suppression contributes to the rapid circadian phase shift (as strong type 0 PRC) and fast entrainment under bright light.
We also apply GDA on the full order TOE problem. Again, we assume that the subject's initial circadian phase is at 6 am (corresponding to x(0) = 0.7562, x c (0) = −0.7591, and

Combining direct shooting and gradient descent algotihms
Local optimality of the results of GDA implies that their quality heavily depends on the initial guess of the optimal control input. On the other hand, DSA can be used effectively to find the global optimal solution for lower-order models. We then combine the use of both algorithms by using the solutions obtained using DSA on a lower-order model to warm start the GDA. Specifically, if u � (t) is the optimal circadian input found by applying DSA on a 1st-or 2ndorder model, then our initial guess for the optimal circadian light input intensity I � (t) for the Time optimal entrainment control for circadian rhythm  Time optimal entrainment control for circadian rhythm Fig 8. (Left) Two locally optimal solutions of TOE (full order) for the 11-hour jet-lag case with I max = 1000 lux, obtained by using the GDA with two different initial guesses of the circadian light input signal. (Right) Graphs of the corresponding optimal circadian light inputs I � (t) and the descent direction Δ(t). We can also observe that, per Lemma 2, both solutions are (locally) optimal.
https://doi.org/10.1371/journal.pone.0225988.g008  full order model is determined by Then, we apply the GDA to further optimize I � (t). Note that we do not assume that the GDA will converge to I � (t) that only assumes two values, 0 or I max .
We compare the performance of the solutions that we obtained using this strategy with published results by Serkh and Forger [22]. The entrainment time for various jet-lag cases for our results and those from [22] are shown in Fig 10. In the same figure, we can also see the performance improvement between the solutions before the application of the GDA algorithm (thus, the initial guesses) and the solutions after the GDA algorithm. The comparison between the blue (results in [22]) and black curves (our results) verifies the performance of our algorithm. Also, the TOE light inputs from our algorithm (shown as the green dash curves in Fig  6) are also consistent with those in [22] (in particular, Fig S13 in [22]).
The entrainment time in Fig 10 implies that the maximum entrainment time in our gradient descent results occurs at around 12-13 hours shift (westward travel), which means that the JFK model is almost east-west symmetrical. These results are consistent with those in [22], which addressed the minimum-time entrainment problem of the same circadian model. However, if we only use the natural light-dark cycle in (7) with I w,max = 100 lux for entrainment without any other light input, the entrainment time in Fig 11 demonstrates that the JFK model is still east-west asymmetrical as maximum entrainment time occurs at around 15 hours westward shift. These results are consistent with those in [28], which also considered the entrainment of the JFK model using the natural light-dark cycle. These results imply that the JFK model is east-west asymmetrical under natural light but the optimal light inputs for minimum-time entrainment can remove this east-west asymmetry.
The solution strategy proposed by Serkh and Forger is briefly described as follows. It assumes that the optimal light input I � (t) only has two values, 0 or I max . This assumption is motivated by the fact that time optimal control problems with bounded input constraints generally have bang-bang or bang-off optimal solutions [26,27]. The exceptions of this case are when the optimal solutions lie in a so-called singular arc [26,27].
The switching time instants of the light input are parameterized as t 1 , t 2 , � � �, t N and considered as the optimization variables. The optimal control problem in TOE is reformulated as follows. To represent entrainment, the concept of isochrons is used. Essentially, this is established by using a phase function φ : R � R ! S 1 ; such that two pair of states of the circadian oscillator (x 1 , x c1 ) and (x 2 , x c2 ) are considered to have the same phase if φ(x 1 , x c1 ) = φ(x 2 , x c2 ). Then, the following objective is optimized under the constraint φðx 1 ðTÞ; x c1 ðTÞÞ ¼ φðx 2 ðTÞ; x c2 ðTÞÞ: ð47Þ The constant C > 0 in (46) represents the trade-off between minimizing the end time and matching the amplitude of the circadian oscillator with that of the reference signal. The optimal switching time instants t � 1 ; t � 2 ; . . . ; t � N are then computed using a gradient descent technique. Compared with [22], the algorithm used in this paper is proposed without the assumption that the minimum-time optimal input is always bang-off and computes the light  1000 lux (middle), and 10000 lux (bottom). The black curves represent results obtained using the GDA algorithm with initial guesses described in Sec. 4.3, while the blue curves represent those from [22]. The red curves represent the results of applying the optimal solutions of the corresponding TOE 2nd-order problems to the full order model. These are the initial guesses before the GDA is applied. input during the whole entrainment process instead of only the switching times. The results in [19] and [30] show the existence of singular arcs in the minimum-time entrainment solutions in several circadian rhythm models. These singular arcs violate the assumption that the optimal input is always bang-off. Therefore, our algorithm is more general in solving the optimal solution in various models.

The impacts of model reduction on the TOE solutions
In Sec. 3, we discussed two steps of model reduction of the full order JFK model. First, we ignored the dynamics of the Process-L and introduced the 2nd-order model. Second, we ignored the amplitude of the oscillation and introduced the 1st-order model. Here, we study the impacts of these model reduction steps on the solutions of the time-optimal entrainment problems. The TOE solutions of the full order model are also demonstrated in Figs 3 and 6. Comparing them with the TOE solutions of the 1st-order and 2nd-order models, we observe that, when the light input is dim (100 lux in the upper panels), the light on-off switching times in the TOE solutions of the 1st-order and 2nd-order models are both very similar with those in the full order model, while the difference between the 1st-order and full order model is enlarged when the light intensity is increased (middle and lower panels in Fig 3). The TOE solutions of the 2nd-order model are close to those of the full order model even under a bright light input, as demonstrated in the lower panel in Fig 6. However, even if we apply the optimal light input from the 2nd-order model on the full order model in the form of (45) in the bright light cases, it still brings a large difference in the entrainment time from the optimal entrainment time in many cases, shown as Fig 10. Specifically, for jet-lag cases, we evaluate the performance of the solutions derived using the lower-order models when applied on the higherorder models. For this, we simply replay the computed optimal control input trajectory u � (t) on the higher-order models until we reach entrainment, which is marked by the circadian Time optimal entrainment control for circadian rhythm states of the subject matching those of the reference, within a certain error tolerance band. If at the end of the computed u � (t) we do not reach entrainment, we append the control input with the reference input trajectory until we reach entrainment. As a concrete example, suppose that we consider the 12-hour jet-lag case with the maximum circadian lighting level of 10000 lux, and want to replay the optimal control input trajectory u � (t) computed using the 1st-order model on the full order model. The trajectory u � (t) in this case has a duration of 121 hours, as shown in Fig 3. The same 12-hour jet-lag case for the full order model is associated with the initial conditions ½x; x c ; n�ð0Þ ¼ ½0:7562; À 0:7591; 0:4061�: We then simulate the full order JFK model with the initial conditions above under the circadian light intensity signal I � (t) obtained using Eq (45) until entrainment is reached or t = 121 hours. If we do not reach entrainment by t = 121 hours, we continue I � (t) with the reference circadian lighting, until entrainment is reached. Physically, this step means exposing the subject to the local/reference circadian lighting until his circadian rhythm is synchronized with the local/reference circadian rhythm. Fig 12 shows the results from the procedure above. We can observe that at lower circadian light intensity the gaps between the solutions of the time optimal entrainment problem of the reduced model and those of the full order model are smaller. In particular, at I max = 100 lux, the solutions obtained using the second order are very similar to those from the full order model. However, the solutions obtained using the first order model already deviate quite significantly from those from the higher order models. This indicates that taking into account the amplitude dynamics is relatively more important than the Process-L dynamics in solving the time optimal entrainment problem.

Feedback implementation of the TOE solutions
The solution strategies obtained with all of the approaches described in the previous section are open loop in nature. Given an initial state and the reference state trajectory, the entire optimal control input trajectory υ � (t) and the corresponding optimal state trajectory ξ � (t) can be computed. However, because of Bellman's Principle of Optimality [26,27], we also know that the optimal input can be given in terms of an optimal feedback law υ � (ξ � , t). In this section, we discuss our approach to construct such an optimal feedback law from the obtained open loop optimal solutions.

Feedback implementation of the TOE solutions-First order case
A feedback implementation of the TOE 1st-order problems amounts to expressing the optimal circadian input u � (t) as a function of the subject's circadian phase θ(t) and the reference circadian phase � yðtÞ. We have obtained and reported such feedback laws in our prior publication [20].  Entrainment time when the optimal input derived using a lower-order model is applied to the higherorder models. The top, middle, and bottom graphs represent the cases of circadian light intensity of 100 lux, 1000 lux, and 10000 lux, respectively. The graph labeled as "1st to 1st" represents the case when the result from the first-order model is applied to the first-order model, thus the optimal solution itself. The graph labeled as "1st to 3rd" represents the case when the result from the first-order model is applied to the full order model, and so on.

Feedback implementation of the TOE solutions-Full order case
Applying the idea above to the solution of full order TOE means we need to compute the optimal circadian light input I � as a function of the three states of the system (x, x c , and n) and the states of the reference system (� x, � x c , and � n). However, since the reference trajectory is periodic, we can replace the states of the reference system with its circadian phase θ r , where the definition of the circadian phase is given in Eq (11). Further, we aim for the feedback law to be independent of n (the state of the Process L), because the implementation of such feedback law would require measuring (or estimating) n, which has much faster dynamics than the circadian rhythm.
The procedure for constructing the feedback law is as follows: Step 1. For a given I max , collect a number of optimal trajectories ½x � ðtÞ; x � c ðtÞ; n � ðtÞ; y r ðtÞ; I � ðtÞ� each with different initial conditions ½x � ð0Þ; x � c ð0Þ; n � ð0Þ�.
Step 2. Generate N s data points by sampling the trajectories. That is, each data point , � � �, N s } is a point on a trajectory described in Step 1.
Step 3. Find an interpolation function F : For the results reported in this paper we use a feedforward neural network trained by gradient propagation and cross-entropy loss function with a single hidden layer of 32 neurons (implemented by the patternnet function in Matlab) to represent the feedback function F : R 3 ! f0; I max g.  Time optimal entrainment control for circadian rhythm We evaluate the robustness of the feedback controller in three ways, as described below. Cross-validation with new jetlag cases. We apply the learned feedback controllers on jetlag cases that are not used in the training data set. Specifically, we consider the jetlag cases with T lag 2 1 2 ; 1 1 2 ; � � � ; 23 1 2 � � . We compare performance of the optimal open loop solutions of the TOE as described in Sec. 4.3 for these cases with the respective performance of the learned feedback controllers. The results are shown in Fig 15. We can confirm that the feedback controllers indeed perform well in this cross-validation.
Robustness w.r.t. n(0). The feedback controllers ignore the influence of n(t) in computing the optimal control response. To evaluate the validity of the assumption that n(t) can be ignored, we use the feedback controllers under various initial conditions n(0). The results shown in Fig 16 confirm that n(t) plays a very small role in the performance of the feedback controlled system. Robustness w.r.t. change in I max . We train several feedback controllers for different I max values. As shown in Fig 14, there are significant differences between these controllers. To evaluate their generalizability, we use the learned feedback controllers with light inputs with different I max . Specifically, we evaluate the performance of the feedback controllers that are learned using data with I max = 10000 lux and I max = 1000 lux on circadian light source with I max = 5000 lux. The results are shown in Fig 17, which shows that these controllers can still perform well under the change in I max . For example, the feedback controller for I max = 10000 lux results in entrainment times that are less than 10 hours longer compared to the optimal open-loop solutions.

Discussion and conclusions
In summary, we developed a method for constructing an optimal feedback controller for the time-optimal entrainment problem for the Jewett-Forger-Kronauer (JFK) circadian rhythm Time optimal entrainment control for circadian rhythm model. Our approach is based on the calculus of variations, which enables us to formulate local optimality conditions for the solutions. It also enables us to define a gradient descent algorithm based on the local optimality criteria.

Comparison of our method and the method in [22]
The optimal entrainment results from our method and the method in [22] demonstrate that our method works as well as the method in [22] in minimizing the entrainment time. The optimal solutions found in this paper demonstrate that we do not encounter singular arcs. In other words, the optimal light input for the minimum-time entrainment of the JFK model is bangoff. The method in [22] is less complex because it has fewer decision variables (i.e., the switching times), but the assumption that the minimum-time optimal light input is bang-off does not always hold, as shown in [19] and [30], due to the existence of the singular arc in the optimal solution. Our method is more general because it does not assume bang-off solutions and, then, could find an optimal solution that contains some singular arcs.

On the relevance of lower-order circadian rhythm models
Because of the local optimality property, finding the global optimum would require us to search the solution space exhaustively for solutions that satisfy these conditions. While this procedure is not feasible for the full order JFK model, we demonstrate that it can be performed successfully on reduced-order models that we discuss in this paper. Further, the resulting optimal solutions can be used as initial guesses for the optimal solution of the full order model and then improved using the gradient descent algorithm. Therefore, effectively we use the lowerorder models as surrogates to the full order model to warm start the gradient descent search algorithm for the optimal solutions.

Impacts of model reduction on the TOE solutions
Two simplified versions of the JFK model have been discussed in this paper: (i) the 2nd-order model that is the result of ignoring the dynamics of the Process-L, (ii) the 1st-order model resulting from ignoring the dynamics of both the Process-L and the amplitude of the circadian oscillation. We evaluate the impacts of the model reduction on the solutions of the TOE problems by replaying the solutions for the lower-order models on the higher-order ones. The results indicate that for low light intensity, the solutions from the second-order model are practically the same as those from the full order model. Meanwhile, the solutions from the first-order model differ more significantly from those from the second-order model. This suggests that the amplitude dynamics is relatively more important than the Process-L dynamics in solving the TOE problems. Another indicator of the importance of the amplitude dynamics is the fact that in many optimal solutions for the second and full order models the oscillation amplitude is quenched along the way to reach entrainment. This is shown, e.g., in Fig 6 (bottom) and Fig 7 (bottom).

Potential impacts of amplitude suppression in circadian oscillator
The amplitude suppression phenomenon had been discussed in [29], whose experimental data showed that both the amplitude and phase of the circadian temperature were dramatically changed by the bright light pulses (7000*12000 lux). This helps explain why the circadian oscillator is dramatically quenched in I max = 10000 lux but almost remains on the limit cycle when I max = 100 lux during the minimum-time entrainment processes. The amplitude suppression in the circadian oscillator is closely linked to the minimum-time entrainment with bright light in this paper and previous literature. However, CBT amplitude suppression may also have potential connections with health-related issues: the experimental results in [31] indicates that the CBT rhythm tends to flatten with increasing age. The work in [32] also demonstrates that the quenched amplitude of the CBT is closely connected with sleep disruption. Therefore, the entrainment under very bright light with potential sleep disorders may be not suitable for every individual.

Robustness of the optimal feedback controller
We use a machine learning algorithm to learn an optimal feedback controller for the (full order) TOE problem using data collected from the optimal open-loop trajectories, which are computed using the approach above. In general, we discover that the optimal feedback controller depends on I max , the maximum circadian light intensity that is used for entrainment. We subsequently train separate optimal feedback controllers for three different values of I max .
However, we also demonstrate that the trained optimal controller is robust to some variations in I max . Further, the trained feedback controllers depend only on the states of the circadian oscillator, but not on the state of the Process-L. We find that the state of the Process-L has little effect on controller performance. The minimum-time optimal light strategies proposed in this paper can be implemented on the entraining travelers if the initial circadian states and local time in the destination are known. Although the light intensities of most outdoor lights are highly noisy and maintaining the same lux for hours could be unrealistic, some lighting devices (re-timer and light therapy glasses) and indoor light (the light in the hospital rooms and offices) can be used for offering stable light inputs. The robustness of entrainment against the variation in the light intensity and circadian states is improved by using the feedback entrainment strategy in Section 6, which requires the real-time measurement of the circadian states, i.e., core body temperature. For this purpose, some portable sensors had been developed for instantaneous measurements of the core body temperature [33]. The feedback implementation of the entrainment process is enabled by these sensors.