Dynamic neural reconfiguration for distinct strategies during competitive social interactions

Information exchange between brain regions is key to understanding information processing for social decision-making, but most analyses ignore its dynamic nature. New insights on this dynamic might help us to uncover the neural correlates of social cognition in the healthy population and also to understand the malfunctioning neural computations underlying dysfunctional social behavior in patients with mental disorders. In this work, we used a multi-round bargaining game to detect switches between distinct bargaining strategies in a cohort of 76 healthy participants. These switches were uncovered by dynamic behavioral modeling using the hidden Markov model. Proposing a novel model of dynamic effective connectivity to estimate the information flow between key brain regions, we found a stronger interaction between the right temporoparietal junction (rTPJ) and the right dorsolateral prefrontal cortex (rDLPFC) for the strategic deception compared with the social heuristic strategies. The level of deception was associated with the information flow from the Brodmann area 10 to the rTPJ, and this association was modulated by the rTPJ-to-rDLPFC information flow. These findings suggest that dynamic bargaining strategy is supported by dynamic reconfiguration of the rDLPFC-and-rTPJ interaction during competitive social interactions.


Introduction
Competitive social interaction is a common situation in which people compete with one another for a finite resource or a common objective Swab and Johnson (2019) . When these interactions repeat many times, participants often dynamically switch between different strategies, such as reputation-building and reward-collecting, usually with a long-term goal of maximizing self-interests Camerer and Weigelt (1988) . These Fig. 1. Two-party bargaining game and dynamic behavior strategy. (A) Task design: the "buyer " is given the private value of a hypothetical object. He or she is then asked to "suggest a price " to the seller (values and prices are integers, 1-10). The seller then receives the suggestion price and is asked to offer a price . If the offered price is less than the private value of the object, the trade will be executed, and the seller receives a reward of while the buyer receives a reward of − , otherwise, the trade will not occur. Buyers and sellers do not receive feedback after each trial. (B) Dynamic bargaining strategy of a buyer (Subject ID 64) during the 60 rounds. True value (red) and suggested price (blue) were plotted. Scatter plots for the true value against the suggested price were reported for each behavioral window together with a least-squares line fitted to the data. (C) Positive slope for a bargaining strategy of incrementalist sharing the reward with the seller. (D) Negative slope for strategic deception trying to maximize the buyer's own reward. (E) Near zero slope for conservative who does not communicate any information to the seller during the game. that these dynamic strategies for social interaction are supported by dynamic reconfiguration of the functional interactions among these brain regions Yang et al. (2020) . However, owing to the technical limitations of commonly used dynamic modeling approaches Calhoun et al. (2014) , there have been few studies on the dynamics of these functional interactions and their behavioral association with strategic sophistication. New insights on this topic might help us to uncover the neural correlates of social cognition in healthy population and also to understand the malfunctioning neural computations underlying dysfunctional social behavior in patients with mental disorders Brüne and Brüne-Cohrs (2006) .
To probe the neural bases of the dynamics in social decision-making, here we used a self-paced, multi-round social interaction game ( Fig. 1 A) Bhatt et al. (2010) . In each round of the game, a buyer is first informed by the computer of the true value of a virtual item and then suggests a price to the seller to sell the item. The seller will sell for any positive price. The seller can infer the prior probability distribution of possible values but is not informed of the buyer's trial-specific value. The seller makes a price offer and if the offer is below the value, a sale takes place (but this information is hidden to the players, who do not get any feedback). The final earnings from sales were reported at the end and paid to subjects.
Our focus was on the strategies that buyers used to suggest prices based on their private trial-specific values. In our sample, we observed three types of bargaining strategies Bhatt et al. (2010) : 1) "conservatives " whose suggested prices revealed no information to their partners; 2) "incrementalists " who anchored their social signals (i.e. the suggested prices) to the true values of the items (as evidenced by a high correlation between values and prices); and 3) "strategists " who used a more sophisticated strategy by mimicking the incrementalists. That is, the strategists generated a series of prices with variability similar to the prices suggested by incrementalists, in order to build a reputation in their partners' minds that their prices were revealing information about value. However, the strategists suggested low prices for the most highly valued items (to earn a lot in those trials, i.e. reward-collecting) and high prices for low-value items (which are not very profitable but necessary for reputation-building). Their values and prices are therefore negatively correlated. Theoretically, the existence of these three strategy types has also been predicted by a Bayesian model of belief formation with each type possessing different depths of theory of mind, and no other strategies has been predicted Bhatt et al. (2010) .
In a previous analysis of this game, we considered the last 30 rounds of the game to be revealing a single stable strategy Bhatt et al. (2010) . Each subject was classified into one of the above three strategic groups by the extent to which their suggested prices revealed the true value of the bargaining item Bhatt et al. (2010) . Compared with the other two groups, we found greater activity of the left rostral prefrontal cortex [rPFC or Brodmann area 10 (BA10)] in the strategic group Bhatt et al. (2010) , suggesting that long-term goal maintenance was necessary for the strategic deception. The stronger information flows from both dorsal anterior cingulate cortex and retrosplenial cortex to the BA10 were associated with a higher level of deception during the game Luo et al. (2017) , which further highlighted the involvement of the cognitive control systems for the strategic deception. Apart from BA10, the rDLPFC has been associated with self-interested behavior, as its cortical thickness has been shown to be negatively associated with prosocial giving to strangers in the dictator game, but not in the ultimatum game Yamagishi et al. (2016) . There is also evidence that rTPJ is essential for integrating others' beliefs into one's own strategic choice, since its causal interruption by repetitive transcranial magnetic stimulation (rTMS) reduced the ability to model the other's belief Hill et al. (2017) . However, the functional role of the rDLPFC-and-rTPJ interaction remains unclear, mainly owing to its dynamic nature during competitive social interactions. In our sample, the rTPJ was dynamically engaged in the strategic deception Bhatt et al. (2010) , i.e. the rTPJ became more activated when the strategists switched from reputation-building to reward-collecting. Not only was the engagement of rTPJ dynamic, but also the bargaining strategy of the sellers was dynamically switched during the game ( Fig. 1 B -E). This dynamic switch of strategy might be supported by the corresponding dynamic reconfiguration of the rDLPFC-and-rTPJ interaction. Therefore, such a highly time-varying interaction requires a more dedicated model to reveal its dynamics.
Unlike previous studies, which assumed that one participant could have only one strategy during the whole game, we relaxed this assumption by investigating the dynamic switches between strategies from trial to trial using a hidden Markov model (HMM). The hidden state at each trial was defined as one of the three strategies introduced above, including the conservative, incremental, and strategic strategies. The Markov property was assumed: given the state of the current trial, the state of the next trial would be independent of the previous trials. The observation at each trial was calculated from the association between the suggested prices and the true values in seven adjacent trials that are centering at the current one. When the adjacent trials shared the same hidden state, they naturally constituted a behavioral window of time when the sellers adopted the same bargaining strategy.
Next, we proposed a novel approach, namely time-varying Granger causality with signal-dependent noise (tvGCSDN), to estimate the dynamic effective connectivity between the key brain regions at each round (Supplementary Method S1). The proposed algorithm has the following two main advantages: 1) Instead of setting a window length for the sliding-window algorithms in most of the dynamic functional connectivity analyses Preti et al. (2017) ; Simony et al. (2016) , tvGCSDN makes an estimation at each trial by borrowing the strength of functional magnetic resonance imaging (fMRI) data during the whole bargaining game. 2) tvGCSDN is applicable to systems with signal-dependent noise which violated the assumption of Gaussian white noise assumed by most of the previous dynamic effective connectivity models Havlicek et al. (2010) ; Ryali et al. (2011) ;Sato et al. (2006) . Signal-dependent noise is common in neural systems and has been detected in both physiological recordings Harris and Wolpert (1998) ; Luo et al. (2011) ;Phan et al. (2019) and fMRI time-series signals Anika et al. (2020) ; Luo et al. (2013Luo et al. ( , 2017Luo et al. ( , 2020 . We proved mathematically that the mis-specification of the time-invariant model to a dynamic system underestimates the effective connectivity (Supplementary Method S1). We also demonstrated by simulations that the proposed tvGCSDN could track the time-varying parameters of the dynamic systems both with and without signal-dependent noise (Supplementary Method S2, Figs. S1 and S2). The strength of the effective connectivity estimated by Granger causality has been proven to be equivalent to a measurement of the information flow from the cause to the effect Barnett et al. (2009) . This equivalence enabled us to test the behavioral associations of the estimated information flows between the key brain regions in relation to strategic sophistication during the bargaining game.

Dynamic switches between strategies uncovered by HMM
According to the hidden states (i.e. the bargaining strategies) decoded by the HMM ( Fig. 2 A; Supplementary Method S3), we found three types of behavioral windows, including 80 incremental windows, 28 conservative windows, and 28 strategic windows. In total, 31.6% of participants (n = 24) switched their strategies during the game with a mean [SD] number of switches 1.5 [0.67] times ( Fig. 2 B). We detected 35 transitions between strategies, including 8 incremental to strategic, 12 incremental to conservative, 4 conservative to strategic, 10 conservative to incremental, and 1 strategic to incremental transitions. Compared with our previous time-invariant behavioral groupings using only the last 30 trials ( Bhatt et al., 2010 ; Fig. S3A-B), we re-classified 13.4% of trials ( = 305 ) into a different behavioral category and discarded 15.7% of trials ( = 357 ) as unstable.
To characterize each behavioral window, we fitted a linear regression model using the true value to predict the suggested price. The slope of this regression model, which reflected the way in which buyers revealed the information about values of the items to sellers during the game, was used as a behavioral parameter for the pattern of information revelation ( IR ). As expected, we found that the conservative windows had the IRs close to zeros (Mean ± SD = 0 . 13 ± 0 . 11 ) with low model fits ( 2 = 0 . 13 ± 0 . 11 ), the incremental windows showed positive IRs ( 0 . 49 ± 0 . 19 ) with good fits ( 0 . 73 ± 0 . 15 ), and the strategic windows exhibited negative IRs ( −0 . 59 ± 0 . 26 ) with good fits ( 0 . 46 ± 0 . 20 ; Fig. 2 C). The values of the virtual items, the starting time, and the length of the behavioral window were compared among three types of behavioral windows, and no significant difference was found (Table  S1). We also found that buyers with older ages had more incremental windows compared with those with younger ages ( = 0 . 36 , = 0 . 0015 ), while females had more incremental windows compared with males ( = −0 . 39 , = 0 . 0004 ; Table S2).
In the estimated initial distribution of HMM, we found that the probability of using the incremental strategy was 58%, using the conservative strategy was 37%, and using the strategic strategy was 5% ( Fig. 2 D). As estimated by the transition matrix, the probabilities of the incrementalto-conservative and the conservative-to-strategic transitions were 0.03 and 0.05, respectively ( Fig. 2 A). As predicted by 59 repeated transitions from the initial distribution, the probability of adopting the strategic strategy significantly increased to 26%, while the probabilities of the incremental and the conservative strategies decreased by 7 and 14 percentage points, respectively ( Fig. 2 D). Similar trends were observed by the end of the game, as 30% of participants used the strategic deception while the percentages of participants using the conservative and the incremental strategies decreased by 14 and 7 percentage points, respectively.
Evaluating the quality of clustering by the Davies-Bouldin Index (DBI; the smaller the DBI the better the clustering quality Davies and Bouldin, 1979 ), we found that the time-varying grouping in the current study had a better clustering quality ( DBI 1 = 0 . 58 ) than a timeinvariant grouping reported previously ( Bhatt et al., 2010 ; DBI 2 = 0 . 64 ; the 95% confidence interval of DBI 2 − DBI 1 was [0 . 055 , 0 . 062] established by 3000 bootstraps). This advantage remained to be significant when evaluated by the Calinski-Harabasz Index (CH; the bigger the CH the better the clustering quality Cali ń ski and Harabasz, 1974 ). The CH of the current clustering ( CH 1 ) is 264.9 and the previous clustering ( CH 2 ) is 173.2. The 95% confidence interval of CH 1 − CH 2 was [20 . 6 , 171 . 3] , established by 3000 bootstraps. This might be important for the dynamic effective connectivity analysis, since a better definition of the behavioral window could better identify dynamic activity between brain regions. The decoding process of the hidden Markov model. The observations used in the method are the robust fit slope, Spearman correlation coefficient, and the corresponding p -value. (B) Behavioral windows identified for each subject. Blue for incremental window, green for a conservative window, red for strategic window, and white for the unidentifiable trials. (C) The clustering result of behavioral windows in the two-dimensional feature space, where x -axis represents IR and y -axis represents 2 . (D) The transition of the probability distribution of hidden states from the first round to 21 th round, 41 th round, and the end of the game.

Strategic deception engaged additional brain circuits
When the time-invariant grouping was used for the effective connectivity analysis between those three key brain regions (i.e. the BA10, rDLPFC and rTPJ), no group difference in any information flow could be identified by either the time-invariant GC model ( In contrast, using the behavioral windows revealed by the HMM, we found that the rDLPFC-and-rTPJ interaction was significantly different among these three types of behavioral windows after the Bonferroni correction ( 0 . 0083 = < 0 . 05∕6 ). The strength of this interaction was measured by the mean information flow ( IF ), which was averaged among the trials within the same behavioral window. This interaction differed in both directions, i.e. rTPJ-to-rDLPFC ( 2 , 111 = 9 . 88 , = 0 . 0001 ; Fig. 3 D) and rDLPFC-to-rTPJ ( 2 , 114 = 5 . 14 , = 0 . 0073 ; Fig. 3 C). The post-hoc analyses showed that the strategic deception engaged stronger information flows in both directions when compared with the other two bargaining strategies.
The regional variation in the hemodynamic response function (HRF) could be a potential confounding factor in the GC-based models. However, we compared the HRF delay parameters between each pair of these three key brain regions and found no significant difference (Fig. S10A-C; Supplementary Method S4). Therefore, the regional variation of HRF is not a significant problem for the current analyses. We also systematically tested the performance of the proposed tvGCSDN at various conditions when the HRF delays combined with the neuronal transmission delays between the estimated cause and effect brain regions. By numeri-cal simulation assessing the performance of the model at different levels of HRF delay and neuronal transmission delay (Supplementary Method S5), we found that when the HRF delay was slower in the cause region but faster in the effect region, no significant information flow was likely to be detected, whereas reversed information flow was less likely to be detected. For example, given no difference in the HRF delay, when assuming a neuronal transmission delay of 40ms or 80ms, the tvGCSDN gave 76% or 26% non-significant causalities and no reversed causality (Fig. S11). These simulations demonstrated that the estimated dynamic information flow by the tvGCSDN was a reliable measurement for the dynamic strength of the corresponding effective connectivity.

Top-down control correlated with level of deception in strategic windows
After the Bonferroni correction ( 0 . 0083 = < 0 . 05∕6 ), we found that a stronger negative IR was associated with a stronger IF BA10 →rTPJ in the strategic windows after controlling for age, sex, social economic status, and activities of these two brain regions ( = −0 . 6051 , = 0 . 0078 ; Fig. 3 E). Meanwhile, stronger IF BA10 →rTPJ was also associated with a better model fit ( = 0 . 6953 , = 0 . 0019 ; Fig. 3 F). However, these associations were not significant either in the incremental or the conservative windows ( Fig. S6-S7). Among all these behavioral windows, we found that the IF r TPJ →r DLPFC but not the IF r DLPFC →r TPJ served as a moderator ( 1 , 100 = 4 . 99 , = 0 . 0277 by a 2-way analysis of variance) of the association between the IF BA10 →rTPJ and the IR. . (E) Behavioral associations between the information revelation ( IR ) and the mean BA10-to-rTPJ information flow in strategic window group. (F) Behavioral associations between the 2 and the mean BA10-to-rTPJ information flow in strategic window group. (G) Brain map of the significant information flows (blue) and the behavioral associated information flow (red). * < 0 . 05 ; * * < 0 . 01 ; * * * < 0 . 001 .

Discussion
Dynamics in social decision-making demonstrate that with cognitive control one can explore different social strategies during competitive social interactions. It has been hypothesized previously that dynamic social strategies are supported by the dynamic reconfiguration of neural information flows in certain brain circuits. However, such functional reconfiguration remains difficult to investigate, partially owing to both its dynamic nature and noise in neural recordings. The current study proposed a hidden Markov model to uncover the trial-to-trial transitions between bargaining strategies and a time-varying GCSDN model to reveal the functional reconfiguration of effective connectivity supporting these transitions. The current study is the first to demonstrate that stronger information flows between rDLPFC and rTPJ is associated with engaging in strategic deception -more specifically, top-down control from BA10 to rTPJ.

Behavioral modeling of dynamics during multi-round social interactions
Notably, the dynamics revealed by the HMM suggest an interaction between the theory of sequential equilibrium in repeated economic games Camerer and Weigelt (1988) and the hypothesis of intuitive prosociality Jamil and Jason (2013) . When considering the incremental strategy as the most prosocial behavior and strategic deception as the most self-interested behavior, the conservative strategy falls somewhere in between -not collaborating but also not cheating. The hypothesis of intuitive prosociality is partially supported by the finding of 58% of buyers started with a prosocial strategy, i.e. sharing their rewards with sellers. However, the individual variation in intuitive prosociality Speer et al. (2020) is also significant, as the other 42% of buyers starting with non-prosocial strategies. When bargaining repeats many times, some buyers can be persuaded that it is a competitive setting and begin to explore different strategies for bargaining to maximize their reward. As a result of this exploration process, many participants adopted the strategic deception, which was a mixed strategy that included both reputation-building and reward-collecting. As predicted by the theory of sequential equilibrium, rational buyers build their reputation in the sellers' minds by sharing rewards with the sellers early in the game, but at a later stage of the game they suggest much lower prices for highly valued items and progressively collect more rewards. This prediction describes the strategic deception, as examined in the present study, very well. However, not all participants had chosen to adopt strategic deception by the end of the game. Even if the game were to be repeated infinite number of times, the steady-state distribution of the HMM ( incremental ∶ 48% , conservative ∶ 23% , st rat egic ∶ 29%) predicts that many buyers but not all of them will adopt strategic deception. Therefore, in the current task design, some buyers explored various strategies during this game with a goal of maximizing reward through repetitive competitions, while some buyers chose to play this game in a collaborative way of sharing reward. The estimated transition probability matrix of the HMM characterizes this exploration process, suggesting a gradual increase in computational complexity. In particular, the probabilities of directly transitioning between the incremental strategy and strategic deception are near zero, but the probability of indirectly transitioning from the incremental strategy to strategic deception via the conservative strategy is greater than zero ( Fig. 2 A).

Dynamic effective connectivity for brain functional reconfiguration
Explicit modeling of the dynamics during social interactions enhanced our findings of how the brain's effective connectivity was modulated. We identified within-subject dynamic switches between different bargaining strategies in more than 31% of participants. That is, participants seemed to be exploring different strategies in this competitive social interaction. Even during the last 30 trials of this game, in which the sellers were assumed in previous studies to be implementing a stable strategy Bhatt et al. (2010) ; Luo et al. (2017) , we still found more than 20% of trials needed to be reclassified to have a different strategy. To support different strategies, the underlying information processing in brain circuits needs to be functionally reconfigured. This would require that observed fMRI signals are generated from a time-varying model; therefore, assuming a time-invariant model would be an oversimplification. This over-simplification thereby limited the ability of the effective connectivity analyses in the previous studies to identify any group difference in the connectivity between distinct strategies. After decoding the hidden state (i.e. the strategy) for each trial, the self-transition within the same state naturally defined a behavioral window. At all the trials within this window, the participant implemented the same strategy. Compared with the neuroimaging data from the last 30 trials, the data within the behavioral window are more homogeneous and are more likely to be generated by the same configuration of the underlying brain circuit. For the relatively stable brain activations of both the BA10 and the rDLPFC, our findings were the same as the previous time-invariant analysis that both regions had stronger activations for strategic deception compared with the other two types of strategies during the game Bhatt et al. (2010) . However, for a highly dynamic brain region (i.e. the rTPJ) during this game, the homogeneity of the data creates one major advantage for the current analysis, which is to detect significant differences in the effective connectivity supporting distinct strategies, which could not be detected previously Bhatt et al. (2010) ; Luo et al. (2017) .
Another advantage of the current approach is the use of a nonparametric weighting scheme to borrow the strength of the whole timeseries data to inform the local estimation at each trial. Given the dynamic nature of social interactions, it is methodologically difficult to investigate neural mechanisms underlying information exchange owing to the limited number of scans for each event of interest King-Casas et al. (2005) . The sliding window has been a popular approach to address this problem Lindquist et al. (2009) , but the resulting connectivity of the sliding window analysis can depend unpredictably on the window length Lindquist et al. (2014) . A model of functional connectivity has been elegantly embedded into an HMM by treating patterns of connectivity as hidden states Warnick et al. (2017) . However, this model can not decouple an interaction between two brain regions into information flows going in two opposite directions, since functional connectivity in this model is a symmetric measure. In social decisionmaking, top-down (or expectation-driven) and bottom-up (or stimulusdriven) information flows are likely to serve different functional roles Baldauf and Desimone (2014)  . A previous study found that stronger top-down infor-mation flow from the dorsal attention network to the ventral attention network was associated with better performance in an attention task, while a bottom-up information flow in the opposite direction was associated with worse performance Wen et al. (2012) . Under the assumption that information flows among brain regions evolve smoothly over time, here we used the whole time-series data to make a robust estimation at each trial by employing a non-parametric weighting scheme Fan and Yao (2003) . As demonstrated both theoretically and numerically (Supplementary Method S1), a time-invariant model could miss significant interactions when averaging between both positive and negative effects. Therefore, estimating the Granger-causality at each trial and then averaging only among those trials with the same bargaining strategy is the key to revealing the dynamic effective connectivity.

Additional information flow may be necessary for strategic deception
The differences in the effective connectivity identified above reflect what we know about the behavioral types identified in the twoparty bargaining game. Strategic deception requires a forward-looking, longer-term strategy of manipulating their reputation in the eyes of their partners in order to increase their aggregate rewards. The incremental strategy focuses only on the information present in the current round, while the conservative strategy is the simplest strategy of all by sending uninformative signals to sellers. Consistent with these strategic differences, we found significantly stronger information flows between the rTPJ and the rDLPFC when engaging in strategic deception compared with the other two strategies. This information exchange is particularly interesting in light of the proposed functions of these two brain areas Stallen and Sanfey (2013) . Evidence from neuromodulation during the ultimatum game has shown that the rTPJ has been implicated in the perspective-taking during bargaining in proposers, while the rDLPFC is implicated in the self-interest inhibition in responders Speitel et al. (2019) . However, the interaction between these two areas has not been well characterized, as the neuromodulation tool (e.g. tDCS or rTMS) is likely to affect both the targeting region and its interaction with other areas. After controlling for regional activities, an enhanced rTPJ-and-rDLPFC interaction was identified in our sample. During strategic deception, higher rDLPFC activation reflects the higher demands of both working memory and cognitive control, since buyers need to keep track of their previous suggestions to infer their reputation in sellers' minds Bhatt et al. (2010) and also need to control the selfish impulsive drives MacDonald et al. (2000) ; Mansouri et al. (2009) ;Spitzer et al. (2007) . Our finding of an enhanced rTPJ-and-rDLPFC information exchange for strategic deception could be interpreted as the higher-level computation (e.g., orienting attention, allocating working memory, comparing complex strategies, etc.) of social strategy in the rDLPFC is triggered and informed by the immediate attentional needs on modeling another player's mind at the rTPJ from trial to trial. Indeed, evidence from neuromodulation has indicated that the rTPJ is causally involved in social decision-making where mentalizing is necessary Hill et al. (2017) .
The enhanced rTPJ-to-rDLPFC information flow facilitates a significant association between the BA10-to-rTPJ information flow and the level of deception. BA10 has been implicated in long-term goal maintenance Burgess et al. (2007) and is highly activated during strategic deception Bhatt et al. (2010) . Stronger BA10-to-rTPJ information flow, as a top-down control, was associated with a higher level of deception (i.e. suggesting much lower prices for the high-value items). This association might be interpreted as rTPJ activation for strategic deception being under close regulation by the long-term goal of obtaining more rewards. This interpretation may explain why the rTPJ activation was not consistently greater over all deception trials but was instead modulated by the value of the bargaining item.
Interestingly, the rTPJ is also a key node of the ventral attention network (VAN) Schuwerk et al. (2017) , while the rDLPFC is a key node of the dorsal attention network (DAN) Corbetta and Shulman (2002) . It has long been hypothesized that top-down control from the DAN to the VAN is necessary to filter out unimportant distractions Corbetta et al. (2008) . In a visual attention task, stronger DAN-to-VAN control is associated with better performance Wen et al. (2012) . Therefore, our finding of strengthened rDLPFC-to-rTPJ information flow for strategic deception suggests that apart from mentalizing, attention is also a key cognitive process to achieve better performance in competitive social interactions.

Implications for dysfunctional social behavior
The current study might contribute to the study of dysfunctional social behavior. The identification of neural mechanisms underlying disturbed social functioning has been an important research topic for developmental and personality disorders with dysfunctional social behavior Lazarus et al. (2014) ; Mier et al. (2013) ; Ruocco et al. (2010) . For example, tasks from behavioral economics research involving trust and cooperation were used to gain further insight into the interpersonal functioning of individuals with borderline personality disorder King-Casas et al. (2008) ; Unoka et al. (2009) . In the current study, the twoparty bargaining paradigm also designed a social signal (i.e., the suggested prices) to probe social functioning during an fMRI experiment. Different from the previous tasks, the current design with a no feedback multi-round bargaining allowed the participants to explore various bargaining strategies, and thereby provided a new opportunity to investigate the dynamics during social interaction and the underlying neural correlates. The use of dynamic behavior modeling in the task might help us better understand the neural mechanism underpinning the patients' difficulties in initiating and maintaining social interactions.

Limitations
Granger causal modeling (GCM) of fMRI data has been widely used as a statistical tool to decouple the interaction between two brain regions into information flow in two directions Friston et al. (2013) ; Seth et al. (2015) . Interpreting the resulting effective connectivity from the model has certain limitations, such as the regional variation of the hemodynamic response function (HRF) and the low sampling rate of the BOLD signal Smith et al. (2011) . When the regional variation of the HRF is within a certain range (e.g., the hemodynamic delay is comparable with the neural delay between the cause and effect brain regions), GCM can still give a reliable inference of the effective connectivity, which has been discussed at length both theoretically  and empirically Schippers et al. (2011) . The strength of information flow between brain regions estimated by the GCM has been associated with task performance Wen et al. (2012) . In our case, we deconvolved the HRF from the BOLD signal before the effective connectivity analysis and also confirmed that the hemodynamic delay was comparable among these three brain regions of interest. While statistical tools for effective connectivity are important for neuroscience Park and Friston (2013) , given the complex nature of the brain, the current approaches are still in their infancy, and the resulting conclusions should be taken as suggestive. However, increasingly sophisticated methods for causal inference using fMRI data, such as the method proposed in the current paper, promises to significantly increase our understanding of the neural computation underlying social decision-making. Also, social cognition is not only dynamic but also context-dependent. With or without immediate feedback in these interactions may have different neural correlates. Future studies are needed to investigate the neural correlates for bargains with immediate feedback.

Participants
Secondary data analyses were performed using a sample of 76 healthy subjects who participated in the accordance with a protocol ap-proved by the Baylor College of Medicine Institutional Review Board. Two subjects were excluded in our current study because one of them pressed the button too fast on many trials to contain only one scan in each of those trials, and the other subject was excessively slow in making decisions in the task, with one trial lasting for more than 7 min during the bargaining game. Therefore, 74 subjects were included in our following analysis, including 36 females and 38 males.

Two-party bargaining game
In this game, two players, a buyer and a seller played 60 rounds of a bargaining task. The duration of each round was self-paced depending on how long it took the participants to respond. The inter-trial interval was randomized according to a uniform distribution from 4 to 6 s. At each round, the buyer received a randomly generated private value of a virtual item and then suggested a price to buy this item from the seller. Without knowing the true value of the item, the seller submitted a price to sell this item according to the suggested price by the buyer. If the submitted price was less than the true value of the item, the true value of the item was shared by both parties, i.e., the buyer collected a reward as the true value minus the submitted price − , while the seller got a reward as the submitted price ; otherwise, no one got any reward. The buyers could adopt different strategies during this game, including anchoring their suggested price to the true value, or suggesting a constant price without revealing any information of the true value, or mimicking the first strategy but suggesting high prices for low-valued items and low prices for high-value items ( Fig. 1 A). During the 60 rounds of bargaining, no feedback was given to either the buyer or the seller. Subjects would receive their aggregate earnings over 60 trials at a predetermined exchange rate at the end of the experiment. Both subjects were in fMRI scanners during the entire session. The current study focused only on the "buyers ".

Dynamic behavior modeling using the hidden Markov model
In the previous study Bhatt et al. (2010) , after the behavior stabilized in the game (assumed to be in the second half of all 60 rounds), the buyer's suggested prices ( ) were regressed against the true values ( ) by a linear model for the information revelation: (1) Previously, we used two behavioral parameters to identify the different behavioral groups in the buyers, -the slope 1 , denoted as the parameter for the information revelation ( IR ), and the variance-explained 2 Bhatt et al. (2010) . However, the assumption that each subject had only one strategy in this game might be an oversimplification. Taking Subject 64 for example, a positive slope 1 was found in the first 20 trials, showing an incremental strategy ( Fig. 1 B and C). However, between the 30th and the 40th trials, behavior was characterized as strategic with a negative slope ( Fig. 1 B and D). In the last 20 trials, the slope became nearly zero, which was considered conservative ( Fig. 1 B and E).
Here, we used the hidden Markov model to uncover the withinsubject dynamics in the bargaining strategy. We assumed that the transitions among three strategies satisfied the Markov assumption and the observed bargaining behavior was generated by the hidden state. Mathematically, three bargaining strategies (i.e. the incremental, conservative, and strategic strategies) were modeled as the hidden states = { } 3 =1 , with the transition probability matrix as = ( ) , , ∈ , where stands for the probability of transition from state to state . The observational data are denoted as 2∶ , = ( 2 , , … , , ) ( = 1 , … , 76 and = 60 ), where , = ( , , , , ( , ) 1 ) is the observation vector at the th round of the th subject. According to the assumption that the system changed very slowly, we chose an interval of 7 rounds, which included the current round, 3 past and 3 future rounds, to calculate observations. , is the Spearman correlation coefficient between the true value and the suggested price calculated among the adjacent trials ( − 3 , − 2 , … , + 3) , , is its corresponding -value, and ( , ) 1 is the slope in Eq. (1) estimated using a robust regression algorithm in Matlab (i.e., [robustfit]). Further assuming that the observation follows a Gaussian distribution with the state-dependent parameters, we have the following model parameters: where is the initial probability distribution of these three hidden states, and are the mean and variance of the Gaussian distribution. The Viterbi algorithm ( Forney., 1973 ) was employed to decode the most likely sequence of hidden states from the observation sequence for each subject. More details about this algorithm are provided in Supplementary Method S3. When a hidden state remained for a few rounds before it transited to another state, a behavioral window for this hidden state was naturally defined. As a quality control for detecting a stable strategy, the behavioral window consisted of no less than 8 adjacent rounds, otherwise, the strategy was considered to be unstable. Our findings remained the same if using different intervals (i.e., the interval containing 5 or 9 adjacent trials, or the interval only containing the past 7 adjacent trials; Fig. S8) to calculate observations or choosing different thresholds of the minimum length of behavioral windows (varying from 7 to 11 adjacent rounds; Fig. S9).

Image acquisition and preprocessing
The fMRI data were collected using a 3-Tesla Siemens scanner. Whole-brain echo-planar images were acquired with a repetition time (TR) of 2000 ms (echo time, TE, 25 ms). Thirty-seven 4-mm slices were acquired 300 off the anteroposterior commissural line, yielding 3 . 4 × 3 . 4 × 4 . 0 mm 3 voxels. The fMRI data were preprocessed previously Bhatt et al. (2010) using the SPM ( http://www.fil.ion.ucl.ac.uk/spm ) with a standard procedure, including the slice-timing correction, motion correction, co-registration, normalization to the Montreal Neurological Institute template, and high-pass filtered (128 s).
To carry out the analysis of the effective connectivity, we controlled for the event-induced dynamic, since this dynamic might constitute a common driver of brain activity in all these brain regions. Here, we first estimated the HRF of each brain region for every subject by means of the generalized linear model (Supplementary Method S4;Friston et al., 1995 ). Second, we convolved the estimated HRF with the event train (e.g. trial onset, thinking, choice-making) and down-sampled it to the same sampling rate as the BOLD signals before regressing out this event signal from all brain regions. Third, the residual BOLD signals were detrended and corrected for head motion by using 6 parameters for both translation and rotation. Considering the effects of sudden head movement, the trials with frame-wise head movement (i.e., the difference between adjacent volumes), which had been transformed to zscore, greater than 1 were zero-weighted by the tvGCSDN for the effective connectivity analysis. Focusing on the BOLD signal for choicemaking, we only used scans between the onset trial and the price submission of the suggested price. For the Granger causality, the procedure of zero mean was conducted separately for each trial Luo et al. (2017)

Dynamic information flow between brain regions during the bargaining game
To deal with both time-varying information flow and signaldependent noise in the BOLD signal, we proposed a novel algorithm, i.e. the tvGCSDN. Briefly, the information flow between each pair of the three ROIs was estimated locally at each round ( IF ROI 1 →ROI 2 ( ) , = 2 , … , 60 ) by making use of the data from all the other rounds as the indirect observations. The log likelihood function was calculated as a weighted summation among all rounds. The weight of each round was given in a kernel form and determined by both the distance from the current round and the motion parameter. We provide more details of this algorithm in the supplementary material methods S1. We also investigated the information flow between brain regions by GC method and GCSDN method as a comparison.
As Granger causality works only when the regional variation of the hemodynamic response function (HRF) of the BOLD signal satisfies certain conditions  , we tested whether the regional variation of the HRF was significant among the three ROIs in the current study. Considering that the sensitivity of the analyses for effective connectivity depends on the neuronal delay between the source and the target regions and their relative HRF delay Schippers et al. (2011) , under different levels of neuronal delays, we systematically performed a series of simulations to evaluate the model performance.

Statistical analysis
At each behavioral window defined above by the HMM model, we estimated both the behavioral parameter of the information revelation ( IR ) in model (1) and the mean information flow between each pair of the ROIs ( IF ROI 1 →ROI 2 that was averaged among all rounds within this window). Possible confounding factors were also taken into consideration, including the activation of the ROIs (the median of the percentage of signal change in BOLD compared with round onset), age, sex, and socioeconomic status. Only 29 subjects had IQ measurements available, so we did not use this covariate. The mean information flow IF ROI 1 →ROI 2 was compared among three types of behavioral window by one-way analysis of variance. Before the group comparison, we first regressed out the above confounding factors from the mean information flow, and then excluded the outliers identified as outside the range of ± 2.7 standard deviations away from the mean in each direction for each type of behavioral window. In each direction, the averaged number of windows excluded from each behavioral type was 1.6. For the behavioral association, we calculated the partial correlation between IF and IR while controlling for the above covariates. The Bonferroni correction was used to control for the multiple comparisons among all 6 directions, i.e. < 0 . 05∕6 .

Data and code availability statement
The datasets and code generated and analysed during the current study are available at https://github.com/rhyang2021/ data-code4TVGCSDN . A Matlab toolbox of this algorithm is also available at https://github.com/qluo2018/GCSDN . Dynamic neural reconfiguration for distinct strategies during competitive social interactions (Mendeley Data).

Human ethics statements
The work described has been carried out in accordance with The Code of Ethics of the World Medical Association (Declaration of Helsinki) for experiments involving humans.

Declaration of Competing Interest
Authors declare that they have no conflict of interest.