The neural network underpinning social feedback contingent upon one's action: An fMRI study

Praise enhances motor performance; however, the underlying feedback pathway is unknown. Here, we hypothesized that the social evaluation feedback to the motor system is modified by the top-down effect of the social contingency valuation system, such as the anterior rostral medial prefrontal cortex (arMPFC). We developed a pseudo-interactive task that simplified a conversational student-teacher interaction and conducted a functional magnetic resonance imaging study with 33 participants (13 men, 20 women; mean age = 21.7 years; standard deviation = 2.0 years). The participant inside the scanner uttered the pseudo-English word to the English teacher outside the scanner. The teacher provided feedback of acceptance or rejection by either gestures or words, through video. As a control condition, the pseudo-word was read aloud by a computer. Approval from the teacher enhanced the participants' pleasure rate. Feedback to the participants' utterance, either rejection or acceptance, activated the arMPFC. Irrespective of the preceding utterance by self or computer, acceptance compared with rejection activated the right primary visual cortex (V1), and the reverse activated the left V1. This valence-dependent laterality of V1 activation indicates that the effect is not the domain-general modulation of visual processing. Instead, the early visual cortices are part of the valence-specific representation of the social signal. Physio-physiological interaction analysis with the seed regions in the right and left V1 and the modulator region in the arMPFC showed enhanced connectivity with the bilateral primary motor cortex. These findings indicate that the socially contingent, self-relevant signals from others act as feedback to the motor control system, and this process is mediated by the early visual cortex.


Introduction
Praise is defined as "positive evaluations made by a person, of another's products, performances, or attributes, where the evaluator presumes the validity of the standards on which the evaluation is based " ( Kanouse et al., 1981 ). Praise can boost selfefficacy ( Bandura, 1977 ) and positive feelings such as competence and autonomy ( Blumenfeld et al., 1982 ;Deci and Ryan, 1985 ), strengthen the association between responses and their positive outcomes ( O'Leary and O'Leary, 1977 ), and provide incentives for task engagement ( Madsen et al., 1977 ). In motor skill learning, praise has been hypothesized to be feedback on the level of a participant's competence ( Catano, 1975 ) that serves as an incentive to enhance practice efforts ( Steers and Porter, 1974 ). Thus, praise accelerates motor as the input signal, resulting in a plastic change in the nervous system called engram ( Josselyn et al., 2015 ). Thus, engram-formed areas should receive feedback signals. A previous functional neuroimaging study of humans indicated that the primary motor cortex and other motor cortical areas are where the motor engrams are formed ( Hamano et al., 2020 ).
Neural substrates of the effect of the social signal on procedural learning are poorly understood. Recently, Doppler et al. (2019) showed the age-dependency of the effect of social reward on procedural learning improvement: social reward especially improved motor sequence learning in elder participants, and the consolidation of motor sequence knowledge was improved by social rewards only in young participants. Their neural correlates based on the gray matter volume-change by voxel-based-morphometry were the left striatum of the younger group and the medial orbitofrontal cortex of the elder group. However, the functional neural pathways of feedback of the social signal to the motor system are unknown.
Previously, we found that the contingent positive responses of others relevant to self-action are recognized as a social reward through social contingency detection, which is mediated by the anterior rostral medial prefrontal cortex (arMPFC) ( Sumiya et al., , 2017. Here, we hypothesized that the social feedback to the motor system is modified by the top-down effect of the social contingency valuation system, such as the arMPFC ( Sumiya et al., 2017 ). To test this hypothesis, we developed a task that simplified a conversational student-teacher interaction to induce a modest degree of rejection and acceptance in the ecological situation, and conducted a functional MRI study with 33 participants. In this task, the participant inside the scanner uttered a pseudo-English word to an English teacher outside the scanner (SELF condition). According to the English-likeness of the pronunciation, feedback was given to the participant with a short video the teacher was in, by either gestures or words of acceptance or rejection. As a control condition, the pseudo-word was read aloud by the computer instead of the participant (PC condition). Our expectation was that the teacher's response would elicit valence-dependent activation, which affects the motor system in a social contingency dependent manner.

Participants
A total of 33 healthy individuals aged 19-27 years (13 men and 20 women; mean age = 21.7 years; standard deviation [SD] = 2.0 years) participated in this study. We analyzed data from 27 participants (9 men and 18 women, aged 19 to 29 years, mean ± SD age = 21.85 ± 2.14 years), after excluding six participants from the analysis owing to low participation (those who failed to respond more than twice in the rating phase). All participants were native Japanese speakers and right-handed, according to the Edinburgh Handedness Inventory ( Oldfield, 1971 ). The Versant English Test was used to assess participants' English proficiency. This test is a 15-minute computerized exam that measures the user's spoken English skills, graded on a scale of 20-80. It has been reported that fewer than 5% of native English speakers scored below 68, whereas English learners are distributed over a wide range of scores, and only 5% of the latter sample scored above 68 ( Pearson Education Inc., 2011 ). We were unable to collect the test score for one participant due to an error in the computer device. The mean "overall " score of the 26 participants was 30.23 (SD = 5.23). This score corresponds to level A1 ( "basic user ") in the Common European Framework of Reference for languages (CEFR), a guideline used to describe the achievement level of learners of foreign languages. No participants had a history of symptoms requiring neurological, psychological, or other medical care. All participants provided written informed consent. This study was approved by the ethical committee of the National Institute for Physiological Sciences of Japan (Issue No. 16A016). All methods were performed following the approved guidelines.

Pseudo-interactive task
All participants completed the pseudo-interactive task. In this task, one of the two speakers (SELF or PC) uttered a pseudo-English word, and a teacher responded after the utterance. There were two modalities (Video and Word) of three teacher responses (accept, reject, and control). In the Video runs, we utilized nodding as acceptance, shaking the head as rejection, and no movement indicating feedback that was somewhere between acceptance and rejection. There was only one teacher and one Video for each condition. In the Word runs, we used "GOOD " to indicate acceptance, "BAD " to indicate rejection, and "SOSO " to indicate feedback that was somewhere between acceptance and rejection. Accordingly, for each modality of responses, this task contained six conditions: SELF_accept (i.e., the self-utterance of a pseudo-word followed by nodding), SELF_reject, SELF_control, PC_accept, PC_reject, and PC_control.

Stimuli
Pseudo-words : To create a list of stimuli, we used the ARC Nonword Database ( Rastle et al., 2002 ) ( http://www.cogsci.mq.edu.au/ research/resources/nwdb/nwdb.html ), which allows the selection of nonwords on the basis of a number of psycholinguistic dimensions. This database guarantees the "legality, " that is, the phonotactic constraints based on the sound-spelling and spelling-sound relationship of real English words that characterize Australian and Standard Southern British English ( Rastle et al. 2002 ). First, we listed some pseudo-words from this database with the following restrictions: the pseudo-words should have orthographically existing onsets in English and orthographically existing bodies, they should follow legal bigrams (i.e., the two adjacent alphabets meet the linguistic rules) in English, and they should contain only monomorphemic syllables (i.e., the syllables in words consist of a single morpheme that cannot be divided into smaller parts). All pseudowords were four-letter words. From the obtained word list, we chose words with 10 different onsets, /b/ /d/ /f/ /g/ /h/ /k/ /l/ /p/ /s/ and /t/, and then created 12 lists of words that included 10 words with different onsets. We adjusted the lists so that there were no differences in the linguistic characteristics between the 12. See Appendix for the list of words.
Auditory stimuli : We recorded the list of pseudo-words that were being read aloud by a computer, using the built-in text-to-speech function of macOS with the voice of an American English speaker "Tom " (Apple Inc., CA, USA). Although this is a computer-generated voice, the quality is quite high, and it sounds like a natural voice. Only words that were pronounced with natural intonations were adopted as stimuli.
Visual stimuli : There were two types of visual responses in this study. One was the Video response, and the other was the Word response. We recorded a female experimenter in front of a black screen nodding or shaking her head at a natural speed or keeping her head still. These short video clips (approximately 2.5 s long) were used as the Video response. The video clips were used in the accept, reject, and control conditions, respectively. For the Word response, we displayed text (GOOD, BAD, or SOSO) below the still image of the video clip of the control condition.

Stimulus presentation
Participants lay in the MR scanner with their ears plugged, with tight but comfortable foam padding placed around each participant's head. We used a presentation software (Neurobehavioral Systems, Albany, CA, USA) (RRID: SCR_002521) to present visual and auditory stimuli and record button responses. Visual stimuli were projected with a liquidcrystal display projector (CP-SX12000J; Hitachi Ltd., Tokyo, Japan) onto a half-transparent screen. Participants viewed stimuli via a mirror placed above the head coil. The viewing angle was large enough for participants to observe the stimuli (13.1°[horizontal] × 10.5° [vertical] at maximum). Participants listened to auditory stimuli through ceramic headphones (KIYOHARA-KOUGAKU, Tokyo, Japan). Participants' utterances were recorded with an opto-microphone system (KOBATEL Corporation, Kanagawa, Japan). Behavioral responses were collected via an optical button box (HHSC-1 × 4; Current Designs Inc., Philadelphia, PA, USA).

Cover story
Participants were instructed to read the pseudo-word aloud in one condition, whereas they were asked to listen to the pseudo-words that were played by a computer in the other condition. Before the experiment, participants met an English teacher (whose face appears in the Video and Word responses); they were told that this teacher would be listening to the participants reading out the pseudo-words in another room and evaluating the English-likeness of the pronunciation by pressing buttons corresponding to one of the three responses. That is, the participants were told that they would see a prerecorded video stimulus triggered by the teacher. Participants were told that the teacher would evaluate the pronunciation of participants and PC from the perspective of "English-likeness; " that is, how natural and understandable the pronunciation was. They were notified that fluent pronunciation (fast-talking) does not enhance English-likeness. Although the teacher's response was pre-determined (as described in the section on stimuli), participants were told that the teacher evaluated the English-likeness of the pronunciation. The teacher was not visible to the participants during the experiment, but visible during the practice trials. To make the cover story as believable as possible, we had the participants see a button actually being pressed by the teacher during the practice trials. The button was pressed immediately after the participant or PC read aloud the pseudo-word, so as to make the participant feel that they were really being evaluated. However, because the buttons were not actually activated, the feedback stimuli (videos and words) were presented in a pre-determined order. Debriefing was done after the experiment to confirm that the participants believed that a teacher had really evaluated them. All participants were asked to fill out a "post-experiment questionnaire" in writing, and also interviewed by the experimenter orally. No participant gave any comments that suggested they were suspicious of the cover story.

Task schedule
In total, the experiment lasted 2.5 hours, including the preparation time. Participants conducted four runs (two for Video and two for Word), each of which lasted for 465 s (930 volumes per run). Each run consisted of 35 trials lasting for 12.5 s (total 437.5 s). Each of the six conditions and a null condition, which showed a fixation for 12.5 s, was presented five times in each run. We inserted a 20-s baseline before the first trial and a 7.5-s baseline after the last trial (437.5 + 27.5 = 465 s). Fig. 1 shows the task sequence for each trial. Each trial consisted of five phases: preparation, speaker's action, teacher's response, and rating ( Fig. 1 ).
Each trial consisted of five phases: preparation, speaker's action, teacher's response, rating, and rest. In the preparation phase, the participant silently read the pseudo-English word. Two conditions were prepared in the action phase: when the frame of the screen turned red, the participant read the word aloud (SELF condition); whereas, when the frame of the screen turned blue, the participant listened to the word, which was read aloud by the computer (PC condition). Each word was new and presented only once. In the response phase, the participant observed one of three responses from the teacher: acceptance (nodding the head/ "GOOD "), rejection (shaking the head/ "BAD "), or control (no movement/ "SOSO "). The participant then rated his or her pleasantness by pressing buttons in the rating phase. Activities during the task were modeled with boxcar functions for each phase except the rest condition. The regressors shown were convolved with the canonical hemodynamic response function. We focused our analysis on the listener's response phase (yellow frame).

Trial sequence
First, there was a preparation phase. The pseudo-word was visually presented on the screen. This phase took 3 s. Second, there was a speaker's Action phase. One of the two frame colors was superimposed on the visual stimuli. When a red frame appeared, the participant was asked to read the pseudo-word aloud (SELF condition). Conversely, when a blue frame was presented, the participant was asked to listen to the pseudo-word that was read aloud by the PC (PC condition). This phase took 2 s. We inserted a resting period, which showing a fixation for 1 s after the speaker's action phase. Third, there was a listener's Response phase. In Video runs, one of the three levels of video response was presented. Under each video, "XXXX " was presented as a control visual stimulus for Word conditions. In Word runs, one of the three levels of word response was presented. Over each word, the still face of the teacher was presented as a control visual stimulus for Video conditions. For each run, the duration of this phase was 2.5 s. Finally, there was a rating phase. Participants reported the degree of subjective pleasure using a 7-point Likert scale (1 = no pleasure , 7 = very pleasurable ). Participants pressed two buttons with their right index and middle fingers to move the cursor to the number corresponding to their subjective pleasure rating in the sequence of the numbers presented in the bottom part of the video screen. The initial position of the cursor was pseudorandomized in the sequence of the rating scale so as to counterbalance the attentional bias. We inserted a resting period, which included a fixation of 1 s after the rating phase.

Data analysis 2.3.1. Data pre-processing
Image processing and statistical analyses were performed using the Statistical Parametric Mapping package (SPM12; Wellcome Trust Centre for Neuroimaging, London, UK) (RRID: SCR_007037). The first 10 functional images were discarded in each run to allow the signal to reach a state of equilibrium. The remaining volumes were used for subsequent analyses. The imaging data were first pre-processed using the MELODIC of the FSL software ( http://www.fmrib.ox.ac.uk/fsl/melodic/index.html ). The preprocessing consisted of 1) rigid body head-motion correction using MCFLIRT (motion correction FMRIB's linear image registration tool), 2) regular up slice timing correction, 3) brain extraction using Brain Extraction Tool, 4) spatial smoothing using a kernel of full-width at halfmaximum 4 mm, and 5) high-pass temporal filtering (cut-off = 100 s). As head motion is known to affect fMRI results, we conducted rigid artifact removal with FSL's FIX tool (FMRIB's ICA-based Xnoiseifier) ( Salimikhorshidi et al., 2014 ), choosing conservative FIX thresholds of 20 ′ (this threshold determines the binary classification of any given component) to reduce the risk of removing signal components. Inappropriate ICA components (such as movement-related components, white matter fluctuations, susceptibility-related artifacts, cardiac pulsation, large veins, etc.) based on spatial and temporal features were manually identified via FSL's MELODIC ICA tool by a researcher (M.S.) per prior guidelines ( Griffanti et al., 2017 ). After denoising the data, the processing and statistical analyses were performed using the SPM12 package (Wellcome Department of Imaging Neuroscience, London, UK). Each participant's T1-weighted anatomical image was co-registered with the mean image of all EPI images for each participant. Each co-registered T1-weighted anatomical image was normalized to the Montreal Neurological Institute (MNI) space using the DARTEL procedure ( Ashburner, 2007 ). More specifically, each anatomical image was segmented into tissue class images using a unified segmentation approach. Gray and white matter images were registered and normalized to MNI space using the preexisting template that is based on 512 brains of Japanese people scanned at the NIPS. The parameters from DARTEL registration and normalization were then applied to each functional image and the T1-weighted anatomical image. The normalized functional images were filtered using a Gaussian kernel of 4-mm full-width at half-maximum in the x, y , and z -axes.

Behavioral data
The general linear model (GLM) repeated measures in IBM SPSS Statistics for Windows (Version 24.0. IBM Corp, Armonk, NY) was used to analyze pleasure ratings. Results of analyses with p < .05 were considered significant. Figures were created using GraphPad Prism (Version 5.03) for Windows (GraphPad Software, San Diego California USA; www.graphpad.com ).
Specifically, we tested the following three hypotheses: • (1) Considering that acceptance is a social reward ( Izuma et al., 2008 ), we hypothesized the increment of pleasure rates to be more prominently enhanced by acceptance of self-action than by acceptance of other's action. • (2) As social rejection is psychologically "painful " ( Eisenberger et al., 2003 ), we expected decrements of pleasure rates to be more prominently enhanced by rejection of self-action than by rejection of other's action. • (3) As social contingency (other's reaction contingent upon selfaction) is rewarding ( Sumiya et al., 2017 ), we predicted that gestures would be more effective in inducing positive valence scores than words.

Neuroimaging data Initial individual analysis:
After pre-processing, task-related activation was evaluated using a GLM ( Friston et al., 1994 ;Moutoussis et al., 2014 ). The design matrix contained regressors of three fMRI runs. Each run included six regressors of interest (2 Speakers × 3 Teacher's Responses) that were modeled at the onset of the listener's responses (the gesture (Video) and Word responses were conducted in a different run). The duration of each regressor was 2.5 s, corresponding to the duration of the teacher's response (see Fig 1 ). In addition, each run also included the following five regressors: one regressor for the preparation phase, two regressors for the speaker's action phase (SELF or PC), and one regressor for the button press during the rating phase. A GLM was fitted to the fMRI data for each participant. The blood oxygen leveldependent (BOLD) signal time series were modeled using boxcar functions convolved with the canonical hemodynamic response function. Six regressors of rigid body head-motion parameters (three displacements and three rotations) were included as regressors of no interest. We also applied a high-pass filter with a cut-off of 128 s to remove low-frequency A, Accept; R, Reject; C, Control.
signal components. As the traditional AR(1) + white noise model can fail to whiten the data with short TR, temporal autocorrelations were modeled and estimated from the pooled active voxels by the FAST model and were used to whiten the data ( Corbin et al., 2018 ). This alternative pre-whitening method is reported to perform better than SPM's default ( Olszowy et al., 2019 ). No global scaling was performed. We applied a least-squares estimation on the whitened data. The weighted sum of the parameter estimates in the individual analyses constituted contrast images. The contrast images obtained from the individual analyses represented the normalized task-related increment of the MR signal of each participant.
Random-effects analysis : Contrast images from the individual analyses were used for the group analysis. The contrast images obtained from the individual analyses represented the normalized task-related increment of the MR signal of each participant. We adopted a flexible factorial design to construct a single design matrix involving 2 × 2 × 3 task conditions in the Listener's Response phase. All conditions were modeled as within-participant (dependent) levels, and unequal variance among conditions was assumed. The estimates for the conditions were compared using linear contrasts. The resulting set of voxel values for each contrast constituted a statistical parametric map of the t-statistic (SPM {t}). We further conducted an analysis with a flexible factorial design to construct a single design matrix involving 2 × 2 × 3 task conditions. Given our hypotheses, we evaluated the following predefined contrasts shown in Table 1 . The statistical threshold for the spatial extent test on the clusters was set at p < .05 and corrected for multiple comparisons [family-wise error (FWE)] over the whole brain with the height threshold of uncorrected p < .001 ( Friston et al., 1996 ). Brain regions were anatomically defined and labeled according to the Automated Anatomical Labeling ( Tzourio-Mazoyer et al., 2002 ) Atlas 3 ( Rolls et al., 2020 ).
Physio-physiological Interaction (PPI) analysis : We conducted PPI analyses ( Friston et al., 1997 ) to test the hypothesis that self-related activity in the arMPFC modulates the functional connectivity between the visual cortex and the motor control system. To define the seed regions, we identified the top peak coordinates of activation depicted by each effect of the utterance task: the arMPFC by the effect of self-action, and the visual cortex by the effect of the listener's response (accept or reject). As we observed the activation of the right V1 by acceptance and the left V1 by rejection, irrespective of the self or other's action, the left and right V1s were defined as separate seed regions by spheres (radius: 4 mm) centered on the peak coordinates of these regions. We extracted the time series of the signal from each seed region after excluding the effects of no interest with F contrasts.
We then calculated the PPI terms between the arMPFC and the V1 over the following four steps. First, the MR signal from each seed region was extracted as an eigenvariate time series. Second, the extracted MR signal was deconvolved with the canonical hemodynamic response function (HRF). The resulting time series represented an approximation of neural activity ( Gitelman et al., 2003 ). Third, the neural time series of the two seed regions were detrended and multiplied (dot product) so that the resulting time series represented the interaction of neural activity between the two seed regions. We calculated the interaction of neural activity between the arMPFC and right V1, as well as that between the arMPFC and left V1. Finally, the interaction time series were convolved with the HRF, representing an interaction variable at the hemodynamic level (PPI term).
For each participant, we constructed two design matrices; one for the arMPFC and right V1 and the other for the arMPFC and left V1. Each design matrix involved three regressors: the PPI term between the arMPFC and V1 in one hemisphere, and two regressors representing the time series of the MR signal of these seed regions. In the group analysis, we conducted two-sample t-tests on the contrast images of the PPI terms obtained from these individual analyses. To depict the feedback target common to both positive and negative feedback, we conducted a conjunction analysis. We applied the same statistical thresholds utilized for the analysis of brain activation ( p < .05 FWE corrected at the cluster level, with the height threshold at t (37) > 3.33 corresponding to p < .001 uncorrected).

Behavior
A two (Modality: Video or Word) by two (Speaker: SELF or PC) by three (Response type: Accept or Reject or Control) repeated-measures analysis of variance (rmANOVA) revealed a significant main effect of Response type ( F (1.4, 36.399) = 193.823, p < .001, p 2 = .882). Post-hoc pairwise comparisons with Bonferroni's correction revealed that pleasure rating was higher in the Accept compared to Control and Reject conditions (Accept > Control, p < .001; Accept > Reject, p < .001). Pleasure rating was lower in Reject compared to Control conditions (Control > Reject, p < .001). There was a significant interaction between Speaker and Response type ( F (1.160, 30.150) = 26.958, p < .001, p 2 = .509) and Modality and Response type ( F (2, 52) = 8.807, p = .001, p 2 = .253). Post-hoc pairwise comparisons with Bonferroni's correction revealed that in the Reject condition, pleasure rating was higher in the PC compared to SELF conditions (PC > SELF, p < .001); while in the Accept condition, pleasure rating was higher in the Self compared to PC conditions (SELF > PC, p < .001). Additionally, in the Accept condition, pleasure rating was higher in Video compared to Word conditions (Video > Word, p = .002); while in the Control condition, pleasure rating was higher in the word compared to Video conditions (Word > Video, p = .012).
The conjunction analysis with the two contrasts, 1 and 2, in Table 1 revealed the activation in the anterior rostral medial prefrontal cortex superimposed on the axial images of high-resolution T1 weighted MR template at z = 4 through 20 mm.
Irrespective of the preceding utterance of SELF or PC, rejection compared with Accept contrast activated the left V1.

PPI
PPI analysis with the seed regions in the right and left V1 and the arMPFC as the modulator. With left V1 seed, we found enhanced connectivity in the bilateral primary motor cortex (M1, face area). The right V1 specific modulation of the connectivity was found in the bilateral M1, left basal striatum, bilateral PMd, SMA, anterior cingulate cortex (ACC), and insula ( Fig 6 , Table 5 ).

Behavior
The present findings indicate that teacher's acceptance enhances the pleasure rate. This coincides with a previous study by Izuma and colleagues, who found that acceptance acts as a social reward ( Izuma et al., 2008 ). We newly found that the decrement of the pleasure rate by the rejection was also enhanced by social contingency. As social rejection is psychologically "painful " ( Eisenberger et al., 2003 ), the rejection contingent upon self-action enhanced the decrement of the pleasure. These findings are concordant with the notion that the detection of social contingency is tightly linked with social acceptance and rejection. Finally, the gesture was found to be more effective than the word in evoking positive emotions, irrespective of acceptance or rejection. This finding indicates that the reaction is contingent upon the self-action per se rewarding ( Sumiya et al., 2017 ), and underscores the importance of gesture as non-verbal information during social interaction.

arMPFC
In this experiment, the response of the teacher -gesture or verbal -was presented in the context of the self's (SELF condition) or the other's (PC condition) action. The conjunction analysis with the contrast of (SELF-PC) × (Accept-Control) and (SELF -PC) × (Reject-Control) ( Fig 3 ) indicates the social contingency detection irrespective of the valence. Thus, the arMPFC activated by the conjunction analysis is related to the evaluation of the self-relevant signals contingent upon the selfaction. This finding is consistent with the meta-analysis showing that both the positive and negative effects of subjective value on BOLD was observed in the dorsomedial prefrontal cortex in addition to the anterior insula, dorsal and posterior striatum, and thalamus ( Bartra et al., 2013 ).
This cluster in the arMPFC is distinct from the mOFC, which was activated by the contrast of acceptance compared with the control condition, irrespective of the preceding utterance of SELF or PC. As the mOFC is dominant in response to the positive subjective value ( Bartra et al.,  Fig. 3. Neural substrates of social contingency detection Table 3 Results of Accept > Reject contrast by conjunction analysis (conjunction of 3&4, in Table 1 Table 1 ). Irrespective of the preceding utterance of SELF or PC, acceptance compared with the Reject condition activated the medial orbitofrontal cortex and the right V1. 2013 ), the present findings indicate that the neural substrates of the selfrelevance detection and the valuation processes are spatially distinctive within the medial prefrontal region ( Bartra et al., 2013 ).

Valence coding in the early visual cortices
The early visual cortices are related to the valence of the social signals. Since the first report ( Shuler and Bear, 2006 ), the pri-mary visual cortex is known to respond to reward in rats ( Shuler and Bear, 2006 ), macaque monkeys ( Arsenault et al., 2013 ), and humans ( Rossi et al., 2017 ). The human event-related potential (ERP) study by Rossi et al. (2017) showed that monetary loss elicited higher neural activity in V1 compared with reward, whereas the latter influenced postperceptual processing stages (P300), thus encoding both negative and positive values ( Rossi et al., 2017 ). Miskovic and Anderson (2018) ar-  Table 1 ).
gued that a sensory system might play a direct role in representing the pleasantness component of perception, in conjunction with a shared valence code independent of its sensory origin, as represented by the heteromodal limbic and paralimbic regions, including the amygdala, anterior insula, pre-supplementary motor area, and portions of orbitofrontal cortex ( Satpute et al., 2015 ).
A novel and unexpected finding was the valence-dependent laterality of the activation: Positive responses (acceptance) activated the right visual cortex, whereas negative responses (rejection) activated the left counterpart. Direct comparison of acceptance and rejection accentuated the laterality: Rejection deactivated the right V1, and Acceptance deactivated the left V1. Thus, bilateral V1 with interhemispheric interaction may encode the valence.
Previous studies in reinforcement learning concluded that the visual cortical activation caused by feedback might reflect attentional processing rather than reward per se ( Chase et al., 2015 ). In this study, in contrast, the attention towards a positive response was supposed to be equivalent to a negative one, and the other's response with different valence evoked lateralized activation of the visual cortex. Furthermore, this effect was also observed when words were fed back. Thus, the observed laterality of V1 activation by negative and positive responses indicates that the effect is not the domain-general modulation of the visual processing. Instead, the early visual cortices are part of the valence-specific representation of the social signal. This notion is consistent with the clinical observation that an early exposure to abuse or neglect is known to reduce the gray matter volume of the early visual cortex ( Tomoda et al., 2009 ). These abuse-associated neurobiological alterations may not simply reflect damage but some adaptive processes. Social rejection, in contrast to social acceptance, is an essential compo-nent of abuse. Thus, the neural circuit responsible for the processing of these social signals may respond to the excessive input of the rejection through the top-down pathway from arMPFC to the early visual cortices leading to an experience-dependent plastic change. Recent studies of reactive attachment disorder associated with early childhood maltreatment showed the reduced gray matter volume of the left primary visual cortex (-20, -74, 8) ( Fujisawa et al., 2018 ;Shimada et al., 2015 ), consistent with the present findings.

PPI analysis
PPI analysis was conducted with the seed regions in the right and left V1 and the arMPFC as the modulator. As the major role of the sensory network is to capture the significance of cues for behavior in a given context ( Miskovic and Anderson, 2018 ), we expected the relevant coded areas to affect the motor control system, modulated by the context of self-participation, which is represented by arMPFC activity. With both right and left V1 seeds, we found enhanced connectivity in the bilateral primary motor cortex (M1, face area). The connectivity from the V1 toward M1 increased only after the self-action, enhanced by the activation of the arMPFC that codes self-other distinction, irrespective of the valence the bilateral V1s code. Thus, we interpret this finding as the self-relevant, valence-independent feedback signal for self-action.
The M1 is critical in speech control. During verbal communication, up to 100 muscles must be coordinated. Two distinct cerebral pathways act upon the motor nuclei of the lower cranial nerves in humans ( Ackermann and Ziegler, 2010 ). One pathway is the phylogenetically old limbic vocalization system projecting from the ACC via periaqueductal gray matter and a pontine vocal pattern generator to cranial nerve nuclei of the brainstem ( Hage and Jürgens, 2006 ;Jürgens, 2002 ). In con- Fig. 6. Physio-physiological interaction analysis with the seed regions in the right and left V1 and the anterior rostral medial prefrontal cortex as the modulator.
trast, each brainstem nuclei projecting to the vocal tract also receives input from both cerebral hemisphere, particularly from the primary motor cortex, in which integrity is crucial for skilled motor tasks including word utterance ( Ackermann and Riecker, 2010 ). Considering that utterance of the foreign language requires explicit voluntary control, the involvement of the M1 as the target of feedback is reasonable.
Given that the PPI analysis does not allow one to formulate conclusions about clear-cut directionality ( Staudinger et al., 2011 ), two possibilities may be considered. First, it is possible that the V1 modulates functional connectivity between the arMPFC and M1, as this functional connectivity is related to the level of self-esteem represented by arMPFC ( Chavez and Heatherton, 2015 ). However, self-esteem was unlikely to change rapidly in our experiment. Rather, our results support the hypothesis that the input of signals in the V1 evoked by the teacher's response to action processing in the M1 is modulated by the signal from the arMPFC, resulting in the feedback signal in case of self-action. The PPI effect in this study can be interpreted to indicate that the arMPFC sends a source signal to the M1 to change the gain of the neural response to inputs from the V1 ( Stephan et al., 2008 ). This gain-control mechanism may change the patterns of interactions between the M1 and V1.
We did not find any difference in the enhanced connectivity distribution with left V1 and right V1, i.e., no valence-specific enhancement. This finding indicates that the feedback of the social signal to the motor system is valence independent. As the feedback to the M1 was modulated by the activity of the arMPFC that represents the social contingency, the response of the partner is gated by the arMPFC, social contingency detector, through the context of self-action. This mechanism is functionally relevant to regulate the self-action irrespective of the valence characteristics of the feedback.
This valence-independent feedback pathway toward the M1 is in contrast with the motivation-based enhancement in learning. Praise or social acceptance is accompanied by positive valence, as this study showed, and the motivation enhancement is likely mediated by the reward system, which is activated when the praise is perceived ( Izuma et al., 2008 ) and when it is expected ( Izuma et al., 2010 ). Sumiya et al. (2017) This study adopted the experimental setup of Sumiya et al. (2017) . They investigated the neural substrates of positive emotion evoked by the audience's positive responses (laughter) contingent upon self-action (uttering the punchline of a humorous vignette). The instruction to participants was "to read the punchline funnily " to amuse the audience. Thus, participants expected a positive response to their performance, leading to the activation of the reward system dependent upon the degree of the audience's positive response ( Sumiya et al., 2017 ). They also found enhanced connectivity from the auditory cortex toward the ventral striatum, in which the activity of the arMPFC modulated connectivity.

The difference from
In contrast, the present study focused on the feedback nature of the social response contingent upon self-action: The teacher-student relationship was chosen as the experimental setup, and the teacher's response was characterized as the teaching feedback toward the utterance of the pseudo-English words by the student. The participants, upon utterance, expected the evaluation feedback -acceptance or rejection -concerning the "authorized standard, " which is supposed to be possessed by the teacher. This evaluation characteristic of the partner's response is the feature of this study that is distinct from the previous report ( Sumiya et al., 2017 ), causing different patterns of signal transmission toward the motor control system instead of a reward system.
As Sumiya and colleagues ( Sumiya et al., 2017 ) manipulated the degree of positive response from the audience, it was difficult to differentiate the valence coding in the auditory cortices from the intensity effect. In contrast, the current study introduced positive (acceptance) and negative (rejection) feedback, allowing a clear depiction of neural substrates of the valence coding in the early visual cortices.

Limitations and future studies
This study focused on the feedback pathway of socially contingent signals. Thus, we did not evaluate the effect of feedback upon performance, including the effect of the valence of the feedback on the learning processes, which is warranted in a future study.
The pleasure ratings were higher in the non-verbal than verbal condition. To depict the correspondent neural substrates, we contrasted the positive non-verbal response (nodding) to the self-action with positive verbal acceptance ( "GOOD "). We did not find neural activation other than MT/V5 region, representing the head motion of nodding. The control for the non-verbal condition did not include the head motion, which is one limitation of the present study.
Another issue is the interaction with the virtual agent. A person's belief that the agent is a computer program affects social-cognitive processes during human-virtual agent interaction ( Caruana et al., 2017 ;Schurz et al., 2014 ). In this experiment, all participants believed the cover story, thus the effect reported here reflects a perception created by real person-to-person interaction. This does not necessarily deny the possibility of effective communication with a virtual agent. This issue is critical to advances in artificial intelligence, as virtual agents are already starting to play an active role in various fields. To help induce a positive emotion to enhance social bonding in human-virtual agent interaction, effective communication between people and virtual agents has become important ( Numata et al., 2020 ). Numata et al. (2020) found a positive congruent response by a virtual agent-induced positive emotions in the participants, overcoming the effect of believing that the agent is a computer program. The relationship between social contingency and the belief of agents is an outstanding issue for future studies.

Conclusion
In conclusion, we showed that socially contingent, self-relevant signals act as feedback to the motor control system, and this process is mediated by the early visual cortex. These results shed light on the neural network underpinning how social valuation influences learning.

Declaration of Competing Interest
None.