A computational framework for optimal control of a self-adjustive neural system with activity-dependent and homeostatic plasticity

The control of the brain system has received increasing attention in the domain of brain science. Most brain control studies have been conducted to explore the brain network’s graph-theoretic properties or to produce the desired state based on neural state dynamics, regarding the brain as a passively responding system. However, the self-adjusting nature of neural system after treatment has not been fully considered in the brain control. In the present study, we propose a computational framework for optimal control of the brain with a self-adjustment process in the eﬀective connectivity after treatment. The neural system is modeled to adjust its outgoing eﬀective connectivity as activity-dependent plasticity after treatment, followed by synaptic rescaling of incoming eﬀective connectivity. To control this neural system to induce the desired function, the system’s self-adjustment parameter is ﬁrst estimated, based on which the treatment is optimized. Utilizing this framework, we conducted simulations of optimal control over a functional hippocampal circuitry, estimated using dynamic causal modeling of voltage-sensitive dye imaging from the wild type and mutant mice, responding to consecutive electrical stimuli. Simulation results for optimal control of the abnormal circuit toward a healthy circuit using a single node treatment, neural-type speciﬁc treatment as an analogy of medication, and combined treatments of medication and nodal treatment suggest the plausibility of the current framework in controlling the self-adjusting neural system within a restricted treatment setting. We believe the proposed computational framework of the self-adjustment system would help optimal control of the dynamic brain after treatment.


Introduction
The goal of clinical treatment is to adjust the brain circuit to achieve a desirable brain function. The issue lies in how to optimize the treatment to induce the desired function effectively. In this respect, clinical treatment for the brain can be viewed as an optimal control problem.
Great advances have been made in treating brain circuitry with medications, dissection via a surgical operation ( Schreglmann et al., 2018 ), gamma knife radiosurgery or focused ultrasound treatment ( Park et al., 2017 ), invasive electrical stimulation such as deep brain stimulation (DBS) ( Park et al., 2015 ) and vagus nerve stimulation ( Yu et al., 2018 ), or non-invasive stimulation such as transcranial magnetic stimulation (TMS) ( Kar, 2019 ;Park et al., 2013 ) and transcranial direct current stimulation (tDCS) ( Bao et al., 2020 ;Brunoni et al., 2012 ;Pini et al., 2018 ). Despite notable advancement of brain treatment tools, each treatment's effects on the complex human brain system are not fully understood. We also lack a systematic understanding of the brain's reorganization as a response to treatment for each individual. Despite the criticality of the treatment planning, establishing a treatment procedure is highly restricted and relatively slow since diverse experiments are not allowed for the human brain. Thus, a computational framework to support op-timizing the treatment of the human brain is necessary before clinical application, particularly for individualized treatment.
Recently, the importance of the brain control has been receiving increasing attention in the neuroimaging community. Studies on the control of the brain have been conducted based on two perspectives: characterization of the network in terms of the controllability and optimal control of the network to induce a desired function.
To characterize the brain network (e.g., Liu et al., 2011 ), the network controllability has been evaluated over the (mostly linear) state dynamic equation in terms of graph-theoretic perspective. The control inputs affect the activity state at the all or a part of the nodes to induce the desired brain activity at all the nodes, without changing the network topology or parameters of the dynamic state equation. This approach has been conducted to explore differential characteristics of the network between mild traumatic brain injury and health control, between sexes or about heritability ( Cornblath et al., 2019 ;Gu et al., 2017 ;Lee et al., 2019 ).
The optimal control is to find an efficient way to change the nodal sensitivity or network connectivity (or the network topology by removing nodes or edges) to induce a desired system function. For this, diverse types of virtual brain stimulators have been used to predict treatment effects on the system function based on the dynamic neural state model. For example, Falcon et al. (2016)  cesses after chronic stroke with a dynamic state model, the parameters of which were estimated from fMRI signals. Simulation with a dynamic state model has been used to explore epilepsy and surgical interventions Jirsa et al., 2017 ;Olmi et al., 2019 ;Proix et al., 2017 ).
Both groups of brain control studies, however, do not pay sufficient attention to the brain system's self-adjustive procedure after treatment. For example, in the controllability of the brain network , the brain system was assumed to be a stable linear system that alters a brain state according to external perturbations. This is similar to brain control studies conducted to change the brain system. Most control models for the brain do not include self-adjustive neurobiological factors; the brain circuit can be modified not only by direct stimulus but also by secondary changes to the stimulus and self-protective processes, or homeostasis ( Keck et al., 2017 ).
Activity-dependent plasticity is the brain's fundamental ability to update its connectivity according to increases or decreases in activity. Many experimental studies have shown presynaptic plasticity in activating neurons ( Lachamp et al., 2009 ;Shaban et al., 2006 ;Shin et al., 2010 ;Weisskopf et al., 1994 ). Activity-dependent plasticity is a primary mechanism for diverse functions such as learning and memory ( Fusi et al., 2005 ;Kano et al., 2009 ). According to the activity-dependent plasticity, both the treated target region and its connected regions may well be regarded to undergo changes in the connectivity.
Homeostatic plasticity refers to the neural system's self-maintenance within a stable range and applies opposing forces on synapse strength, intrinsic excitability, and synapse number ( Fauth and Tetzlaff, 2016 ). Specifically, high activation of synapses induces synaptic down-scaling, spine loss, and dendrite retraction, which decreases the strength of synapses. On the other hand, in low activation of synapses, synaptic up-scaling, spine gain, and dendrite elongation promote and increase their strength. Homeostatic synaptic scaling is an important mechanism to maintain a single neuron or neuronal populations ( Turrigiano, 2012( Turrigiano, , 2008.
Despite the importance of these adjustment processes in the intrinsic neural circuitry, previous studies on the optimal control of the brain have not considered them in the modeling of the target system; therefore, an initial plan of treatment may not always lead to the best solution for curing brain disease. We propose a computational framework for the optimal control of the brain circuit in the context of the brain's self-adjustments after each treatment, i.e., activity-dependent plasticity and homeostatic plasticity.
The optimal control of the brain depends on the neurobiologically plausible model that responds to the external stimuli. The current optimal control framework is based on a computational model that has state dynamics over effective (directional) connectivity among neural populations. To make the optimal control practical, it calls for connectivity estimation from the observed neuroimaging data. In this respect, one may refer to computational modeling and model estimation to infer the effective connectivity of the target circuitry from the neuroimaging data, for example, dynamic causal modeling (DCM) ( Friston et al., 2003. In our previous study ( Kang et al., 2020 ), we estimated the effective connectivity of the rodent's hippocampus circuitry using DCM of the voltage-sensitive dye imaging (VSDI), while the wild type and mutant hippocampal circuits respond to the consecutive stimuli. The current framework for optimal control was presented for this neural circuitry model of the hippocampus, set to self-adjust its outgoing effective connectivity as activity-dependent plasticity after treatment, followed by synaptic rescaling incoming effective connectivity.
We optimized the nodal property (the maximal PSP) to achieve the desired system's function in processing a given stimulus. In this respect, the current optimal control framework is not exactly same as the conventional optimal control problem where treatments are delivered to the system via general input channels. Instead, the current framework optimizes inputs to change a part of model parameters to minimize the response function's difference to a stimulus between the desired system and the treated system. In this optimization, the framework first estimates hidden parameters for the system's intrinsic adjustment processes, based on which optimal treatment on the parameter is planned.
We conducted and presented simulations for optimal control to make the hippocampus of mutant mice behave like wild-type mice. Based on these simulation results, we argue that the brain system's selfadjustment after treatment is an essential part of optimal control of the dynamic brain.

Materials and methods
We conducted simulations to control the self-adjusting neural circuit, based on a neural state dynamic model of the hippocampal circuit. The details are described in the following orders: 1) basics of the neural state dynamic model, 2) hippocampal circuit as a test system, 3) treatment of the nodal sensitivity, 4) activity-dependent plasticity, 5) homeostatic plasticity, and 6) optimization of the brain control in diverse settings.

Neural state dynamics and observation models
We used a convolution-based neural state model ( Jansen and Rit, 1995 ;Moran et al., 2013 ) that describes the dynamics of information exchange over the asymmetrically connected neural circuit in terms of firing rate. It can be written as the following ordinary differential equations: Here, and indicate the membrane potential and cross membrane current of a neural population .
indicates the polarity of a neural population , which is assigned + 1 for excitatory neural populations and − 1 for inhibitory neural populations. and indicate inverse of decay time constant and maximal postsynaptic potential (PSP). The effective connectivity from a neural population to a neural population is denoted by . The sigmoidal (activation) function ( ) of a neural population describes the transformation of the average membrane potential to the average firing rate of action potentials, denoted by: with parameters of a maximal firing rate and a slope of the sigmoid function. 0 , is the PSP that achieves a 50% firing rate of a neural population ( Jansen and Rit, 1995 ). The external input to neural populations ( ) is multiplied by the input modulation parameter . Linear relations between VSDI signals and membrane potential have been reported in previous studies ( Berger et al., 2007 ;Chemla and Chavane, 2010 ). Accordingly, we applied a linear observation model for the VSDI data as a linear weighted sum of the membrane potential of neural populations: Here, v r represents the VSDI signal at a region r , composed of n neural populations. We used all the parameters, i.e., a scaling factor r of a signal at a region r , and contribution ratio ri to the VSDI signal in region r of the membrane potential at a neural population i, V ri , all of which were estimated in our previous study ( Kang et al., 2020 ).

The hippocampal functional circuitry
As a test system for optimal control, we used a computational model of the rodent hippocampal circuitry, estimated using DCM from the VSDI observed in response to consecutive electrical stimuli ( Kang et al., Fig. 1. The hippocampal functional circuitry for the mutant mouse group was used as a base system of all simulations. The effective connectivity among ten neural populations was estimated in Kang et al. (2020) from the experiment done by Bourgeois et al. (2014) . The neural populations are; E11 (granule), E12 (mossy), I11 (DG Basket), I12 (HIPROM), E21 (CA3c Pyramidal), I21 (CA3 Basket), I22 (CA3 O-LM), E31 (CA1 Pyramidal), I31 (CA1 Basket), and I32 (CA1 recurrent O-LM). External electric shocks u(t) were presumed to affect all regions of the stratum radiatum and dentate gyrus, E11, E12, I12, E21, I21, E31, and I31. Excitatory and inhibitory connections are colored red and blue, modified from Kang et al. (2020) . (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 2.
A schematic diagram for the current optimal control framework with cycles of treatments and the system's self-adjustment steps. (i) Treatment at a node is to increase or decrease the maximal postsynaptic potential H of the node, for simplicity. (ii) Each treatment affects all outgoing effective connectivity from the treated node as an activity-dependent plasticity process. For the altered effective connectivity, homeostatic scaling normalizes all the effective connectivity coming to each node to maintain the total incoming connectivity. Treatments are repeated until target behaviors are achieved. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) 2020 ). In brief, the computational model for the hippocampal slices of wild-type and mutant mice is composed of ten neural populations for three representative regions of the hippocampus (the hilus, the CA3 and CA1 regions). The ten populations are the granule, mossy, DG Basket, and HIPROM in the hilus; CA3c Pyramidal, CA3 Basket, and CA3 O-LM in the CA3 region; and CA1 Pyramidal, CA1 Basket, and CA1 recurrent O-LM cells ( Fig. 1 ). Based on the computational model, we estimated the effective connectivity in two different groups of mice from an open VSDI dataset ( Bourgeois et al., 2014 ). The dataset contains stimulation-locked VSDI of hippocampal brain slices of wild-type and epileptic aristaless-related homeobox gene ( Arx ) conditional knock-out mutant mice ( Arx -/+ ; Dlx 5/6 CRE-IRES-GFP ) ( Marsh et al., 2009 ). The two groups show different neural responses to the external consecutive electrical stimuli.
Under this test condition, we expected the system's adjustment on its effective connectivity based on activity-dependent plasticity and homeostatic plasticity by controlling the maximal PSP of a target node . See Fig. 2 . In all simulations, we started from a mutant mouse's connectivity, which is to be controlled to generate the healthy brain response of the wild type.

Controlling process
The treatment of the system is done by altering the maximum PSP (as a sensitivity) at the target node (or neural population) . * ⇐ + × , where is the ratio of increase (or decrease) compared to the maximal PSP at the target node and * is the final treatment effect at the target node. When is positive, the treatment is set to increase the sensitivity, while the treatment decreases its sensitivity for negative .

Activity-dependent plasticity
Increased sensitivity of a neural population induces more activity, which is assumed to cause an increase in synaptic connectivity with neural populations that the neural population projects to. We simplified the activity-dependent plasticity using the following equations: where , where, i indicates the number of iterations from the treatment. When a treatment of the pre-synaptic sensitivity ( H i ) was applied at the target node , the activity-dependent plasticity is applied to the edges (denoted as → ) that receive direct projections from the target node (first-degree neighborhood). Here, A Max represents a maximal connectivity range allowed for each edge after infinite iterations, and indicates an updated rate at each iteration after treatment, which was set to = 0 . 8 for simulations 1 -2, and = 0 . 5 for the other simulations.

Homeostatic rescaling for connectivity
Synaptic rescaling is done by creating constant total incoming connectivity at each iteration, by scaling the total incoming connectivity ( ) to the node at each iteration to be that of the initial stage (0) . The total incoming connectivity ( ) to the node at each iteration is given by: The rescaling is done for all the nodes with the following method: where ← indicates all the edges that project to the node . It should be noted that the sign of the inhibitory and excitatory effect can be seen in the dynamic equation Eq. (2) , and represents the absolute amount of projections or synapses. We hypothesized that synaptic rescaling is done to balance the number of synapses or synaptic buttons in the node , regardless of the excitatory or inhibitory neural types of incoming inputs.
To make it simple, we fixed a total number of iterations for activitydependent plasticity and homeostatic rescaling cycle to three (except for simulations to show treatment effects along the treatment number) in all simulations of the present study. We did not change self-recurrent effective connectivity in the adjustment process to eschew the potential instability due to the self-recursive plasticity in the recurrent connectivity and to avoid double alterations of the parameter for the treated node, i.e., the nodal sensitivity and its self-recursive connectivity.

Evaluation criterion
To evaluate the status of a treated system (a source system s) compared to that of the target system t , we used the mean square error ( MSE ) between activity signals of , (VSDI) from the two systems, which is defined as follows.
Here, R and T represent a total number of regions and temporal samples for the VSDI signals y . In the present study, the VSDI signals from the three regions, the hilus, and the CA3 and CA1 regions, were considered.

Optimization of brain control
In the optimal control of the brain circuit, one has to include the self-adjustment process of the target system in the planning, formulated with a function of hidden parameters. In the current study, we formulated activity-dependent plasticity of the system after treatment using a parameter , which is estimated from the treatment-response history data.
At the first treatment, no response data were available to estimate in the planning. Therefore, we utilized the MSE landscape of treatment ( ) versus activity-dependent plasticity ( ) for a given node , . By marginalizing the MSE along the , the optimal treatment ( * ) was chosen as below.
where and represent the VSDI signals of the target and simulated systems. and indicate lower and upper bounds of , which can be chosen by the operator in consideration of empirical ranges if available. The MSE landscape was estimated using a Bayesian optimization technique, which searches appropriate sampling points of parameters based on a model of the Gaussian process. Bayesian optimization is composed of two steps -approximating the objective function (using a surrogate model) and an acquisition function to decide where to sample (evaluate). The surrogate model incorporates prior belief about the objective function and updates the prior with samples evaluated from the function to derive a posterior, leading to a better approximation for the function. Based on the posterior of the objective function, the acquisition function determines the next sample for evaluation that is expected to improve the approximation best over the currently accumulated evaluations. In the present study, we adopted a Gaussian process model for the surrogate model ( Snoek et al., 2012 ) and an expected-improvement-per-second-plus function for the acquisition function ( Bull, 2011 ;Gelbart et al., 2014 ). We utilized the MAT-LAB function 'bayesopt' (Mathworks, co. USA) with a default acquisition function ('expected-improvement-per-second-plus'). In all Bayesian optimizations of the present study, 240 iterations were conducted. Based on the MSE landscape, we optimized * in the marginalized space of according to Eq. (11) .
Using the optimal treatment * , the initial system was treated by changing the system Eq. (5) ). The treated system undergoes systematic adjustment according to ( Eqs. (6) -( (9) ), and the effective connectivity A (t) is updated with the system's inherent parameter .
For the second treatment, we utilized the observed signal 1 , i.e., a response signal to the first treatment * , to estimate ̂b y minimizing MSE between the observed signal 1 and the signal generated with previous treatment * and a variable to optimize as below.
To optimize the next treatment, based on the estimated , we selected the optimal treatment parameter * that minimizes MSE between the target signal and the signal generated with the estimated system's parameter ̂a nd a variable to optimize, i.e., * = argmin ( , | , ̂) .
The signal is generated by applying Eqs.
(1) -( 4 ) with the parameters { H, A } from the system stabilized after several iterations for each treatment, derived according to the Eqs. (5) -( 9 ). To consider limited chances of clinical treatments, we restricted the number of treatments to three.
Note that and can be vectors if multiple treatments with different effects are used and if the system's adjustment process is formulated with multiple parameters.

Fig. 3.
An example of the self-adjustment process after treatment at a node, reflected in the transient effective connectivity A and its VSDI signals at the hilus, CA3, and CA1. (A) When the sensitivity of node E21 (red arrow) was increased by 20% ( = 0 . 2) as a treatment, the system undergoes adjustment steps by changing the initial system's effective connectivity as a process of the activity-dependent plasticity, followed by synaptic rescaling as a homeostatic process. This cyclic adjustment continues several iterations (three in this study) until it reaches the allowable (preset) range of the connectivity. The activity-dependent plasticity affects the outgoing connectivity from the treated node, which corresponds to the fifth column of the effective connectivity matrix (A1). Meanwhile, the homeostatic-scaling normalizes every row to be consistent before and after treatments or adjustments (incoming connectivity to each node, S1). Since the changes in the effective connectivity are relatively small compared to the initial strength of the effective connectivity, the difference ( Δ ) between the effective connectivity at the transient system ( A i ) and the initial system ( A initial ) is presented. (B) Progressive changes in the VSDI signals at the hilus (upper panel), the CA3 (middle panel), and CA1 (lower panel) are presented for three iterations of adjustment steps. The color intensities for VSDI signals represent the system's first transient state A1 to the stabilized final state after three iterations A3. The red and blue colors represent the results of the activity-dependent plasticity and homeostatic scaling respectively. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Experiments and results
We conducted five simulation experiments to show the construct validity of the proposed framework. To estimate the system's adjustment parameter (i.e., activity-dependent plasticity), we considered two different situations; 1) when no treatment-response data is available for the treatment planning; 2) when a treatment-response paired data is available from the previous treatment. For the first case, it is necessary to choose a safe range of the treatment without a good knowledge of the activity-dependent plasticity parameter of the target system. In simulation 1, we showed that utilizing , if not accurate within a plausible range, is better for the optimal control than without it in the model of the target system. In simulation 2, we propose a method to estimate the strength of the treatment for the first treatment without treatmentresponse data by marginalizing the MSE landscape. In simulation 3, we extend the scheme of simulation 2 to a case with multiple treatments where a treatment-response data is available. In simulations 4-10, we applied the current optimal control framework to control the mutant system to achieve the wild type's behavior. In simulation 4, we optimized the target node to treat to induce the mutant's response to that of the wild type response. In simulations 5-6, we set up an analogy to medication treatment that regulates specific neural types (e.g., GABA antagonist for GABA neurons) without location specificity. Finally, in simulations 7-8, we treated the mutant system with combined medication and node-specific (i.e., specific neural population in a region) treatments. Based on the modeling of the nodal treatment for the medicated system, we conducted an experiment for a combined treatment of medication and a nodal treatment.

Simulation 1. The system's self-adjustment steps after treatment
We simulated a system that adjusts its connectivity with an arbitrarily chosen ground truth parameter = 0 . 16 (for activity-dependent plasticity) after treatment at node E21 by altering the maximal PSP H of the node E21 with a rate of = 0 . 2 . After treatment, the effective connectivity A was self-adjusted by the activity-dependent plasticity, followed by homeostatic scaling ( Fig. 3 A). Fig. 3 B shows progressive changes in VSDI signals of the system with transient effective connectivity A after treatment.
To test the effect of inaccurate parameter estimation for the groundtruth parameter for activity-dependent plasticity, we simulated systems with 16 potential values of ( = 0.00, 0.02, 0.04, …, 0.30) for two different types of treatments, i.e., increased sensitivity ( = 0 . 2 ) and decreased sensitivity ( = − 0 . 2 ) at node E21. We evaluated MSEs between the signal generated from the ground truth systems adjusted with the ground truth = 0 . 16 and those of the 16 systems. The activity-

Fig. 4.
Results of simulation 1 that show effects of including the activity-dependent plasticity in the model of optimal control. (A) The effective connectivity of two systems after self-adjustment iterations with a given = 0 . 16 for treatments at node E21 by = 0 . 2 and by = −0 . 2 are presented. The red and blue arrows indicate a treated node with increased and decreased sensitivity. (B) VSDI signals from three different systems that underwent self-adjustment with the ground truth = 0.16 (solid lines), with = 0 . 1 (dash-dot lines) and = 0 . 0 (dotted lines, without self-adjustment) are displayed for treatment with increased ( = 0 . 2 ) and decreased ( = −0 . 2 ) sensitivity at node E21. Note the effective connectivity is slightly different for the two systems so that it is not easily identifiable by colors. However, signals generated from the two effective connectivity are visually identifiable. (C) The MSE between the signals from the adjusted system with and those with various activity-dependent plasticity ( ) for the two treatments ( = 0 . 2 and = −0 . 2 ) are presented. Even though the ground truth cannot be directly estimated, it is advantageous to include the self-adjustment steps with within a plausible range (for example, = 0 . 1 ) in the optimal control of the system. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) dependent plasticity parameter = 0 . 0 indicates no self-adjustment after treatment. Figs. 4 A and 4 B display the stabilized effective connectivity matrix adjusted with = 0 . 16 after increased treatment ( = 0 . 2 ) and decreased treatment ( = − 0 . 2 ). The ground-truth self-adjusting system ( = 0 . 16) generated signals (after self-adjustments) different from the system without self-adjustment ( = 0 . 0) , particularly at the third and fourth stimuli at the hilus ( Fig. 4 B). For treatments = 0 . 2 and = −0 . 2 at node E21, MSE between signals from the system with the ground truth and systems with different adjustment ratio , are displayed in Fig. 4 C. If we consider the adjustment process within a plausible range (for example = 0 . 1 ), it approximates the target behaviors better than those without considering the adjustment process ( Fig. 4 C).

Simulation 2. Strategy for optimizing treatment strength at a target node without a treatment-response data using marginal MSE landscape
For the initial treatment, no empirical treatment-response data is available to infer the adjustment parameter of the target system. In this situation, to estimate the treatment strength without knowledge of a system's parameter , we constructed an MSE landscape spanned with possible ranges of and . The MSE for and is defined by the MSE between the signals of the target system and signals generated by a set of and . From this MSE landscape, optimal was chosen by marginalizing the MSE landscape along with plausible ranges of .
To test this scheme, we considered a ground truth target system with a history of alterations (adjustment) from the initial system. The system was assumed to have altered three times by damage or by any other reasons at node E21, i.e., ( 21 ) = 0.2, followed by activity-dependent plasticity with ( E21 ) = 0 . 16 in the first transition; ( 12 ) = −0 . 2 with synaptic plasticity (E12) = 0.172 in the second transition; ( 21 ) = 0 . 1 with (E21) = 0.160 in the third transition. The initial and target system's effective connectivity are shown in Fig. 5 A.
The goal is to have the initial system behave like the target system (VSDI signals). Using the Bayesian optimization method, we explored the MSE values between the final target signal and signals generated from various and parameters ( Fig. 5 B). The search ranges of parameters in the Bayesian optimization were set to [ − 0.5, 0.5] for and [0, 0.3] for . The minimum of the marginalized MSE along was found at * = 0.133 ( Fig. 5 C). The marginalized MSE increases rapidly after this optimal value. The MSE between the target and initial systems was 9.015, which was reduced to 4.304 after treatment, with the estimated * ( Figs. 5 D). Since we do not have prior knowledge for the system's adjustment property, the treatment plan is rather conservative, without expecting the treatment's maximal gain.

Simulation 3. Strategy for optimizing treatment strength at a target node using a treatment-response data
We simulated a sequential treatment case, where a treatmentresponse experimental data is available. Using the response signal to the first treatment, we estimated the activity-dependent plasticity pa- Fig. 5. Results of simulation 2 that explain the strategy for optimizing the treatment strength at a target node without empirical data for the system's adjustment process. (A) The initial system ( A initial ) and the target system ( A target ) and their difference are presented. The optimal control problem is to treat the sensitivity at node E21 to lead the initial system with ( A initial ) into a system that behaves like the target system ( A target ). (B) The MSE landscape spanned along the treatment and activity-dependent plasticity is derived by using the Bayesian optimization scheme. (C) The optimal treatment * = 0 . 133 was chosen to minimize the marginalized MSE along . (D) The VSDI signals from the target system (dotted lines) and the initial system (solid lines) are displayed in the upper panel. In the lower panel, VSDI signals from the target and treated systems are shown in dotted and solid lines, respectively. rameter of the system, ̂( Eq. (12) ), based on which a proper treatment * was chosen in the second treatment ( Eq. (13) ). For the initial and target systems used in simulation 2, we applied an optimization scheme for the optimal control of the system. In the case of the first treatment, since no information for the activity-dependent plasticity was available, we chose optimal treatment parameter * = 0.133 that minimizes the marginal MSE along (shown in simulation 2, Fig. 5 B and 5 C).
From the system that underwent iterative adaptive processes for the first treatment, we obtained VSDI signal as a stabilized response to the treatment. Using this signal, we estimated the optimal activitydependent plasticity parameter ̂= 0 . 16 that minimizes the MSE between the observed and the signal generated with the previous treatment * according to Eq. (12) ( Figs. 6 A and 6 B). Based on the estimated parameter ̂, we searched the optimal treatment parameter * = 0 . 17 to minimize MSE between the target system's response and the signal generated with a variable and priorly estimated ̂a ccording to Eq. (13) ( Fig. 6 C). After the second treatment, the final system reproduced target VSDI signals with MSE as low as 0.287 ( Fig. 6 D, the third row).
In the biological system, the activity-dependent plasticity may vary by each treatment. To treat the system with time-varying plasticity, we added a re-estimation step for the activity-dependent plasticity parameter at each iteration. We simulated three systems with different timevarying characteristics in the self-adjustment parameter ; 1) decreasing plasticity − by 0.05 from the initial = 0.16, 2) a fixed plasticity (the identical case of the previous simulation in Fig. 6 ), = 0.16, and 3) increasing plasticity + by 0.05 for each increasing treatment number. We used the same initial and desired systems, and the same optimization scheme for the first treatment as the previous simulation 2 ( Fig. 7 A). Compared to the previous simulation, a step for re-estimating the system's activity-dependent plasticity parameter was conducted by using the treatment-response data set. Based on the estimated parameter, the optimal strength of the treatment * was determined. When the optimal treatment strength becomes zero ( = 0), the system is considered to achieve the best outcome, and the treatment stops. This simulation suggests the current control framework can be used in more wide systems with different self-adjustment characteristics. Having checked this capability, we fixed the plasticity parameter in all the following simulations to simplify simulations.

Simulation 4. Optimal node selection for treatment to induce the mutant system to behave like the wild type system
We applied the optimal control framework presented in simulation 3 to each node of the mutant hippocampal circuits and searched the optimal node in making the mutant system behave like the healthy wild type system. We set the mutant and wild type as an initial system and a target system with adjustment plasticity = 0.15 at all nodes. Here, we assumed that the adjustment parameter is inherent, and is not altered by any treatment. In the Bayesian optimization, the search ranges for parameters and were set to [ − 0.5, 0.5] and [0, 0.3]. The ten treatments were initially planned, but it stopped when treatment did not improve the MSE ( = 0). Figs. 8 A and 8 B show the initial effective connectivity of the mutant system and the difference of the target wild type effective connectivity from that of the mutant (initial) system. Using the optimal control scheme presented in simulation 3, each node was treated one by one. According to the previous optimal control scheme, treatment at node E21 was optimal when treated with ( 21 ) = 0 . 051 , ( 21 ) = 0 . 220 , and Fig. 6. Results of simulation 3 that illustrate the strategy for optimizing treatment strength at a target node based on the responses obtained after the previous treatment. (A) The entire process used in simulation 3 is explained in Equations. For the first treatment, the optimal treatment * = 0 . 133 was chosen to minimize marginalized MSE along (see Fig. 5 C). (B) For the subsequent treatment, by utilizing the response signals 1 to the first treatment with * , the activity-dependent plasticity ̂= 0 . 16 was estimated by minimizing MSE between the response signal 1 and signals generated with * and ̂. (C) The optimal parameter for the second treatment * = 0 . 170 was estimated based on the estimated ̂. (D) Differences of effective connectivity of the target system ( A target ) from the initial system ( A initial ), transient systems ( A r1 , A r2 ) after treatments 1 and 2 are displayed with their optimal treatments * . The signals of the initial system (upper panel) and systems after the first (middle panel) and second (bottom panel) treatments are shown.
( 21 ) = 0 . 180 for each subsequent treatment. Treatment at node I12 was optimal when treated with ( 12 ) = −0 . 051 , and ( 12 ) = −0 . 001 for the first and second treatments. Figs. 8 C and 8 D show examples of the changed effective connectivity after treatments at node E21 and I12. The VSDI signals from the initial mutant and targeted wild type systems, and VSDI signals at the system after treatments at E21 and I12 were presented in Figs. 8 G and 8 H. For the system with = 0.15, the best treatment target node was E21, as it showed the lowest final MSE (MSE = 9.679) ( Fig. 8 E).
We also tested this scheme for diverse systems that possess different adjustment plasticity values: = 0.0, 0.07, 0.15, 0.2, and 0.3 ( Figs. 8 E and 8 I). The node E21 was not the optimal target node for the systems with very small adjustment plasticity = 0 . 0 and = 0 . 07 . However, for the systems that have larger plasticity values: = 0.15, 0.2, and 0.3, E21 was the optimal target to make the system behave like wild type.

Simulation 5. Neural-type specific treatment as an analogy of medication
We simulated treatment with a medication, which affects drugspecific neurons distributed throughout the system. We set the mutant and wild type hippocampal circuits as an initial system and a target system with adjustment plasticity = 0.15 for all nodes, as in the simulation 4.
We simulated the effects of GABA receptor agonists and antagonists by changing the maximal PSP parameters ( H i ) of all inhibitory neurons. To distinguish the effects of drug treatment from those of the nodespecific treatment α, we used to represent the amount of drug effects (or drug doses in terms of optimal control). Fig. 9 A shows the mutant system's initial effective connectivity and Fig. 9 B shows the effective connectivity differences between the target wild type and the mutant (initial) systems, and between the stabilized system after treatment and the initial system. By marginalizing the MSE landscape ( Fig. 9 C), the optimal amount of medication treatment, * , was chosen ( Fig. 9 D). The VSDI signals from the initial mutant and the target wild type systems, and VSDI signals at the system after treatments were presented in Figs. 9 E and 9 F.

Simulation 6. Sequential medication treatments with optimization at each trial
As an extension of previous simulation 5, where the mutant system was treated with one-time medication, we simulated sequential medication treatments. For the first treatment, where a treatment-response experimental data is not available, we estimated an optimal medication dose by minimizing marginalized MSE along , and obtained * = − 0.1 (explained in Figs. 9 ). After the system's adaptive process for the first treatment, VSDI signal was obtained from a stabilized system. From this data, we estimated the optimal activity-dependent plasticity parameter ̂= 0 . 15 , with which MSE between the observed and the signal generated with the previous treatment * was minimized ( Figs. 10 A and 10 C). In the subsequent treatments, based on the estimated parameter ̂, we searched the optimal treatment parameter * to minimize MSE between the target system's response and the signal generated with a variable and the estimated ̂ ( Fig. 10 D). The MSE between the target and finally stabilized system was 24.503 ( Fig. 10 E,  the third row). The VSDI signals from the target system (dotted lines) and the initial system (solid lines) are displayed in the left panel. The strength of the first treatment * = 0 . 133 was determined by previous simulation 2 (see Fig. 5 C). The treatment reduced the MSE between the desired and the treated systems from 9.015 to 4.304 (right panel). (B) The MSE curves for the three different systems are presented along the treatment from the initial 9.015. In the system with decreasing plasticity parameter − , the system converged to MSE = 2.31 after three treatment. In the fixed and increasing plasticity systems with and + , the systems became optimal after two treatments and their final MSE were 0.29, and 1.09. (C-E) The VSDI signals that were obtained after treatment 2 and 3 are presented for each system. The second treatment strength was estimated with activity-dependent plasticity ̂= 0 . 16 , which was obtained by minimizing MSE between the response signal 1 and signals generated with * and ̂ ( Fig. 6 B). (C) In the decreasing plasticity system, the third treatment's optimal treatment strength was * = 0.06, and the final treatment reached the minimal MSE = 2.31. (D-E) In the fixed and increasing plasticity systems, after the second treatments, the systems converged to MSE = 0.287 and MSE = 1.09, respectively. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Simulation 7. Combined treatments with a neural-type specific treatment and a single node treatment
In this simulation, we simulated combined treatments using both medication treatment and a single treatment to control the mutant system. Medication affects the type-specific neurons distributed globally throughout the system as in simulations 5-6 and node-specific treatment affects locally as shown in simulation 4.
We restricted the range of medication effects within − 0.2 ~0.2, and explored optimal strengths of the combined treatments, medication that affects all inhibitory neurons and a single node treatment at node E21. Node E21 was chosen as a target of the single node treatment since it showed better improvement compared to the other nodes. For the first treatment, where a treatment-response experimental data is not available, we estimated an optimal medication effect ( * ) and node treatment ( * ) by selecting and at the minimal point of the marginalized MSE landscape MSE(y t , y s | , , ) along , i.e., * , * = argmin , ∫ ( , | , , ) , and obtained * = − 0.1 and * = 0.15 ( Figs. 11 A and 11 B). VSDI signal from a stabilized system after the first treatment was used to estimate the system's activity-dependent plasticity parameter ( Eq. (2) in Fig. 11 A). We obtained ̂= 0 . 15 that minimizes MSE between the observed and the signal generated with the previous combined treatment * , * , for a variable ( Figs. 11 A and 11 C); ̂= argmin ( 1 , | * , * , ) .
Based on the estimated parameter ̂, we explored the optimal treatment parameter * and * of subsequent treatments to minimize the MSE between the target system and the stabilized system after treatment ( Figs. 11 D and 11 E); * , * = argmin , ( , | , , ̂) . The MSE value between the target and finally stabilized system was 1.793 ( Fig. 11 E, the fourth row), which was the smallest MSE compared to the Fig. 8. Results of simulation 4 that show a strategy for optimal node selection to induce the mutant system to behave like the wild type system after (up to) ten treatments. (A) Effective connectivity of the initial system (mutant) with an adjustment parameter = 0.15 is displayed. (B-D) Differences of effective connectivity from the initial mutant system ( A initial ) are shown for the target (wild type) ( A target ) (B) and the systems stabilized after final self-adjustments ( A treated ) for the treatments at node E21 (C) and node I12 (D). (E) For a system ( = 0.15), the final MSE that the optimal treatment at each node can achieve is displayed. (F-H) The VSDI signals from the initial (mutant) and target (wild type) systems (F), signals from the target and stabilized systems after treatments at node E21 (G) and I12 (H) are exemplarily displayed. The target wild type system signals are plotted with the dotted lines and signals from the initial system and stabilized systems are plotted in solid lines. The blue, red, and yellow colors represent signals from the hilus, CA3, CA1. (I) For the systems with different adjustment plasticity settings = 0 (without considering the plasticity), 0.07, 0.20, and 0.30, the final MSEs that the optimal treatment at each node can achieve are displayed. * and * indicate the treatment number and the MSE that an optimal nodal treatment at the node reached the minimum MSE. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) best optimal single node treatment (simulation 4) and to the medication only (simulation 6).

Discussion
We presented a general framework for optimal control of the brain circuitry. The optimal control of the brain has many inherent challenges in clinical treatment, not only due to ethical issues but also due to the complexity of the brain. Because of ethical and practical issues, research remains mostly theoretical and based on virtual systems. Several models have been proposed to control the brain, including the graphtheoretic controllability perspective ( Karrer et al., 2020 ;Lee et al., 2019 ;Stisoet al., 2019 ;Tang et al., 2017 ) and the optimal control perspective of the dynamic state model Falcon et al., 2016 ;Jirsa et al., 2017 ;Olmi et al., 2019 ;Proix et al., 2017 ). Although the two approaches are similar in perturbing the system ( Menara et al., 2018 ), the purposes differ, i.e., to characterize the system in the graph-theoretic perspective or to find an optimal way to alter the system to generate the desired function.
Despite the differences in these models, they all assume a linear or passive nonlinear brain system without considering the system's dy-namic configuration after treatment. In the current study, we introduce the brain's self-adjusting nature that responds non-linearly to external treatments or perturbations, in terms of activity-dependent plasticity and homeostatic plasticity. Based on the estimation of unknown parameters for these adjustment processes, we propose an optimization framework to treat plastic brain circuitry. In the brain system, activity-dependent plasticity refers to functional and structural changes that arise from endogenous experience or sensation, which is evident in the literature. For example, the functional response map may contract or expand according to reduced or increased input to the brain at the macroscopic level ( Jones, 2000 ). These macroscopic changes can be induced by changes in synaptic efficacy ( Andreae and Burrone, 2014 ), which occurs at either the postsynapse or pre-synapse sides ( Yang and Calakos, 2013 ). In electrophysiology, the increased activity-dependent synaptic changes are reflected in the long-term potential (LTP) and spine gain, whereas decreased activity-dependent synaptic changes induce long-term depression (LDP) and spine loss ( Fauth and Tetzlaff, 2016 ). The mechanism of activitydependent plasticity is still not fully understood at the microscopic level yet. It may be associated with Hebbian plasticity ( Bienenstock et al., 1982 ;Sanger, 1989 ), which describes the strength of a synapse between  ( Fig. 9 ). (C) For the subsequent treatment, by utilizing the response signals 1 obtained after the first treatment with * , the activity-dependent plasticity ̂= 0 . 15 was estimated by minimizing the MSE between the response signal 1 and signals generated with * and diverse . (D) The optimal parameter for the second treatment * = −0 . 03 was estimated based on the obtained ̂. (E) Differences of effective connectivity between the systems after medication treatment 1 and 2 and the initial system are displayed with the optimal treatment * , which is estimated at each treatment. The signals of the initial system ( A initial , upper panel), and systems after the first ( A r1 , middle panel) and the second ( A r2 , bottom panel) treatments are shown. The signals from the target system are displayed with dotted lines and those from the initial and stabilized systems after treatments are displayed in solid lines. The blue, red, and yellow lines represent signals from the hilus, CA3, and CA1, respectively. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) two neurons depending on their correlated activity ( Hebb, 1949 ). Postsynaptic receptors or receptor efficacy may increase due to synchronous firing in the neural population level (multitudes of neurons may compose a neural context in the form of PSP in the post-synaptic neuron and thus, the incoming firing from a pre-synaptic neuron may induce firing in the post-synaptic neuron together). Increased neurotransmitter release from the pre-synaptic membrane due to increased sensitivity may increase neuroreceptors in the post-synaptic neurons. Based on this neural mechanism, the treatment effect in the current simplified model is constructed with the following logic: when a nodal activity is increased due to increased sensitivity, increased activation occurs in the connected regions and thus leads to increased connectivity between the two nodes, leading to changes in the system's behavior. Homeostatic plasticity opposes increases in synaptic strength, intrinsic excitability, and synapse number ( Fauth and Tetzlaff, 2016 ).
Since homeostatic plasticity works as a negative feedback process ( Turrigiano and Nelson, 2000 ), it contributes to stabilizing neural and circuit activity. Many studies have uncovered the mechanism of homeostatic synaptic scaling in cell cultures in vitro ( Turrigiano et al., 1998 ) and in vivo ( Diering et al., 2017 ;Keck et al., 2013 ) and have shown the essential role of homeostatic synaptic scaling in forming memory ( Turrigiano and Nelson, 2000 ). Diering et al. (2017) reported that homeostatic scaling-down occurs during sleep and contributes to the remodeling of synapses that participate in contextual memory consolidation. The remodeling of synapses can be done by controlling the trafficking of glutamate receptors such as the AMPA-type glutamate receptors (AMPARs) to modulate synaptic transmission efficiency ( Shepherd and Huganir, 2007 ). Increasing or decreasing numbers of AM-PARs according to homeostatic demand induces strengthened or weakened synapses ( Diering and Huganir, 2018 ). To incorporate homeostatic plasticity in the model, we applied the rescaling of incoming connections ( Turrigiano, 2008 ) to keep the total number of synapses or receptors constant.
Activity-dependent plasticity and homeostatic plasticity do not work independently from each other. It has been suggested that they work together, which can explain various system functions such as memory formation and consolidation ( Diering et al., 2017 ), recovery processes after brain damages ( Malone and Felling, 2020 ;Murphy and Corbett, 2009 ), and development ( Turrigiano, 2012 ;Turrigiano and Nelson, 2004 ). In the recovery process of the brain from a stroke, the reconstruction of brain circuits is associated with Hebbian and homeostatic plasticity ( Malone and Felling, 2020 ;Murphy and Corbett, 2009 ), in a process similar to the development of circuits ( Turrigiano, 2012 ;Turrigiano and Nelson, 2004 ).
Although activity-dependent plasticity and homeostatic plasticity have been investigated at the microscopic level, simplifying the process remains a crucial challenge in computational modeling. In contrast to Hebbian plasticity, which has widely been used in machine learning and computational neuroscience due to its effectiveness and simplicity, the formulation for activity-dependent plasticity has not been thoroughly researched. This study simplified the homeostatic process by scaling the total amount of synaptic properties despite its complexity at the molecular or cellular level. The details of modeling the biological reality for these adjustive processes could be clarified in future studies.
By simulation, we argue that the self-adjustment plasticity should be considered in the treatment plan. The simulation 1 (and Fig. 4 ) showed that the control model with a self-adjusting plasticity term ( > 0.0), if not highly accurate, is better than without the plasticity term ( = 0.0) in producing the desired function. Simulation 4 ( Fig. 8 ) also confirms the need for plasticity term in the control model. In simulation 4, the best node to control without considering the plasticity was not optimal and showed higher MSE than the treatment with the plasticity term. If not a direct clinical evidence, the resistive process in the clinical treatment may support the current argument. Several studies have argued that the individual differences in the treatment may be associated with diversity in the treatment-resistant process, found in various brain disorders such as schizophrenia ( Patel et al., 2014 ;Potkin et al., 2020 ), depression ( Bennabi et al., 2019 ), and Parkinson's diseases ( Vorovenci et al., 2016 ). These studies suggested that optimal treatments for individuals need to consider the target systems' treatment-resistant processes. Similarly, we argue the activity-dependent and homeostatic plasticity is a factor to consider in the treatment as those types of plasticity are widely known in the neurobiological system.
Considering various situations of clinical treatments, we presented simulations corresponding to major clinical treatment settings. We primarily focused on treating a single brain node (region) since a choice in multiple nodes is generally difficult in the clinical setting, except for some special cases using TMS, which allow treating different nodes at different times but is less effective than other treatments. We also limited the treatment target to a node (intra-regional circuit) rather than an edge (inter-regional circuit) since most clinical treatments are primarily associated with the regulation of a target region (or local circuitry). We disregarded some cases involving white matter treatments such as callosotomy or cingulotomy. In treating a single node (a local neural population), we conceptualized the regulation of maximal PSP using as a representation of sensitivity at the node. When the maximal PSP at a node increases, the node may induce action potentials more frequently and deliver increased signals (firing rates) to post-synaptic neurons. In clinical treatment, changing the local sensitivity of a target node may be possible through invasive electrical stimulation, non-invasive highfrequency TMS, or medications that enhance or suppress neural activity at the node. Functional neurosurgery is one of most common treatments in altering nodal sensitivity. If the surgical treatment resects a specific region completely, the plasticity may occur among neighbors to compensate for the deficit caused by the lesion. In much of functional neurosurgery such as thalamotomy, a treated region is not entirely removed, but a portion of the region is lesioned to alter the regional sensitivity or activity. To show the current optimal control concept, we didn't specify the exact way to achieve the desired maximal PSP. How to achieve the desired treatment level and evaluate the treatment level are still clinical challenges in the practical application.
We also simulated medication, which regulates neural-type dependent diffuse effects on a neural system. In particular, we simulated a medication treatment analogous to GABA receptor antagonists or agonists. Considering limited regional or functional specificity of medication (i.e., all inhibitory neurons do not play the same function), combining medication and single node treatment provides more freedom to control a brain circuitry. This was evident in the simulation 6. When we compared the best-stabilized system for each treatment, the best result was obtained from the combined treatment (simulation 7) than a single node treatment or medication only. By adopting combined treatments, we could also reduce the drug dose, which may decrease the high dose medication's side effects.
In modeling the system's response to an external treatment, we abstracted the system's adjustment process using a parameter . Besides this simplification, we also assumed that is constant across trials. However, we showed that we could estimate by using a response signal after each treatment. By doing this, we can apply the current control strategy to the state-dependent or treatment-dependent changes in the system's plasticity.
Reliable construction of a neural circuit to be treated is another big challenge in clinical treatment. The current framework of optimal control can be practical when we know the system's mechanism, i.e., how it works. The mechanism is often formulated with state dynamics over the effective connectivity among brain regions. Since the functional circuit to be treated is not generally accessible, the effective connectivity of an individual brain circuit should be determined using empirical data for each individual. In this respect, DCM may play an essential role as it provides a Bayesian solution to estimate each individual's effective connectivity using the observed data. In this study, DCM was used to infer the target circuit's effective connectivity from VSDI signals of the rodent's hippocampus. For the optimal control, reliable estimation of a neural circuit using observed signals or other prior knowledge is necessary, particularly for individuals in the clinical setting.
This study is designed to restore the normal response to a stimulus or function, rather than matching the abnormal connectivity of the mutant mice to that of the wild type directly. According to the top-down and bottom-up signaling in the hierarchical brain framework ( Friston, 2008 ), the hippocampal circuitry generates signals (for a stimulus) required by the other neural populations in different layers. Here, we view each nodal activity as a function that transforms a stimulus into a signal for other brain regions and is determined by the parameter. Thus, the treatment issue is to make the system generate the desired function, which is represented by the activity for a given stimulus. In this respect, we chose to match the response activity to that of the desired system, not necessarily matching the same connectivity of the desired system, based on the degeneracy principle ( Edelman and Gally, 2001 ) -the same activity can be produced by diverse circuitry ( Marder and Goaillard, 2006 ). The current framework may guide the direction of treatment in the case of epilepsy. Instead of treatment or prevention of seizures in the mouse model of epilepsy, one may focus on the alteration of the epilepsy-associated (interictal) circuit dysfunction itself, for example, by normalizing response to stimulation at a certain step of the cascaded signaling hierarchy in the epileptic mice.
Although we propose a framework of optimal control to match clinical situations, we acknowledge that clinical treatment is more restrictive than is assumed in the current framework. We cannot disregard the validation issues of the current framework using empirical data. It requires longitudinal data for treated populations using diverse treatment strategies. With more clinical data and scientific knowledge, a more biologically plausible model can be established. Conversely, the modeling and prediction steps would increase our understanding of how the brain works. It may be a long journey before the optimal control of the brain can be applied to clinical practice. Nevertheless, the current conceptual framework and simplified simulations take a first step toward the final goal of optimal control of the brain.
In summary, we propose an optimal treatment framework by introducing the estimation of adjustment processes in the brain system after treatment. Simulation results showing the responses and movement of an abnormal circuit towards a healthy circuit in diverse testing sets suggest the plausibility of the framework and the possibility of optimal control within a restricted treatment environment. Although further research on a more complex system should be conducted for clinical purposes, we believe the proposed computational framework of the adjustment system would help optimal control of the dynamic brain after treatment.