A systematic review of the neural correlates of multisensory integration in schizophrenia

Multisensory integration (MSI), in which sensory signals from different modalities are unified, is necessary for our comprehensive perception of and effective adaptation to the objects and events around us. However, individuals with schizophrenia suffer from impairments in MSI, which could explain typical symptoms like hallucination and reality distortion. Because the neural correlates of aberrant MSI in schizophrenia help us understand the physiognomy of this psychiatric disorder, we performed a systematic review of the current research on this subject. The literature search concerned investigated MSI in diagnosed schizophrenia patients compared to healthy controls using brain imaging. Seventeen of 317 identified studies were finally included. To assess risk of bias, the Newcastle-Ottawa quality assessment was used, and the review was written according to the Preferred Reporting Items for Systematic Reviews and Meta-analysis (PRISMA). The results indicated that multisensory processes in schizophrenia are associated with aberrant, mainly reduced, neural activity in several brain regions, as measured by event-related potentials, oscillations, activity and connectivity. The conclusion is that a fronto-temporal region, comprising the frontal inferior gyrus, middle temporal gyrus and superior temporal gyrus/sulcus, along with the fusiform gyrus and dorsal visual stream in the occipital-parietal lobe are possible key regions of deficient MSI in schizophrenia.


Introduction
We constantly encounter an abundance of sensory information that has to be successfully organized for us to be able to make sense of it. The converging processes of sensory modalities (e.g., auditory, visual, tactile modalities) required to generate a meaningful and coherent perception underlie the concept of multisensory integration (MSI; Talsma et al., 2010). For example, during a conversation with a friend in a busy restaurant, you will perceptually process your friend's voice (i.e., auditory stimuli) and articulations (i.e., visual stimuli) concurrently and rapidly to integrate them. This integration will increase your speech perception because MSI enhances perceptual acuity and improves detection, discrimination and response speed (Wallace et al., 2020). MSI does not only help us avoid cognitive overload and create meaning in the constant sensory information flood (Jensen et al., 2019), but also plays a crucial role for our daily functioning and well-being through guiding our responses to the complex outer world (Tseng et al., 2015).
Different types of research paradigms have been used to study the effects of MSI on behavior. Some focus on multisensory illusions that show how information from different sensory modalities can fuse together into one percept. One example is the McGurk effect in which a video of a person saying one phoneme (e.g., 'ga') is dubbed with a recording of another phoneme (e.g., 'ba') and resulting in the perceived illusion of a mixed phoneme (e.g., 'da') (McGurk & MacDonald, 1976). Other studies focus on the performance enhancement multisensory stimuli enable. One example is the redundant signals effect (RSE), which shows that responses are faster and more precise with stimuli presented in multiple sensory modalities compared to a single modality (Hershenson, 1962;Kinchla, 1974). Groundbreaking electrophysiology studies of neurons in the superior colliculus discovered important principles for MSI (Meredith andStein, 1983, 1986;Stein and Stanford, 2008). According to the principle of inverse effectiveness, multisensory enhancement is greater if the unisensory signals are of low intensity. In addition, multisensory facilitation is maximal when stimuli from different modalities are presented at the same time (temporal rule) at around the same place (spatial rule), and decreases with increased interstimulus onset (Stone et al., 2014).

Neural correlates of multisensory integration
Numerous brain-imaging studies of human and non-human animals (e.g., primates, rodents and cats) have provided evidence for MSI in, and anatomical pathways between, several different locations in the brain. These locations include both higher-order and lower-order cortical areas that are multisensory in nature. Important higher-order association areas are the temporal (i.e., superior temporal sulcus), parietal (i.e., intra parietal sulcus), and frontal (i.e., premotor cortex, prefrontal cortex) cortical regions (Cappe et al., 2012a;Ghazanfar and Schroeder, 2006). For example, the superior temporal sulcus (STS) contains neurons with multisensory properties (i.e., bimodal, trimodal or subthreshold neurons) that respond to stimuli in auditory, visual and somatosensory modalities (Baylis et al., 1987;Desimone and Gross, 1979), and has anatomical connections with visual and auditory cortical areas as well as the prefrontal cortex (Cappe et al., 2012a). Multiple neuroimaging studies on humans have shown that STS is responsive to multisensory stimuli, indicating that it possibly plays a general role in perception of speech and biological motion (see Calvert, 2001, for a review). Functional magnetic resonance imaging (fMRI) studies on humans indicate early interactions between lower-level sensory cortical areas in multisensory processing (see Cappe et al., 2012a, for a review), and the superior colliculus is an important subcortical site for multisensory integration (Meredith andStein, 1983, 1986;Stein and Stanford, 2008). Other subcortical structures like the striatum, cerebellum, amygdala and thalamus are also involved in multisensory processes (Engel et al., 2012).
However, the neural foundations of multisensory processes, constituting oscillations, networks and functional connectivity, are still not well-understood (Keil and Senkowski, 2018). The classical view suggests that integration occurs by convergence (Engel et al., 2012), and this means that integration is a feedforward and hierarchical process in which sensory information is first processed in primary sensory cortices and then integrated in higher order association areas and specialized subcortical regions (Meredith, 2002;Stein and Meredith, 1993). Integration by convergence cannot alone explain multisensory processing, in that, for example, we know cross-modal interactions take place already in primary cortices (Engel et al., 2012;Ghazanfar and Schroeder, 2006;Kayser and Logothetis, 2007). In their review, Keil and Senkowski (2018) present an integrative framework for the role of neural oscillations in multisensory integration built on recent studies. They propose that different frequency band-power (e.g., alpha-, beta-, gamma-, delta-, and theta-band) and functional connectivity networks are associated with different multisensory processes. For example, bottom-up processes, top-down processes and predictions across sensory modalities modulate integration, leaving MSI highly flexible and contextdependent. Nevertheless, our understanding of the neural correlates is still rather poor.

MSI and schizophrenia
Schizophrenia is a psychiatric condition characterized by both positive and negative symptoms. Hallucination, delusion and disorganization are referred to as positive symptoms while negative symptoms are a loss of premorbid functions such as loss of motivation and avolition (Bodén, 2016). The associated cognitive impairments in schizophrenia are hypothesized to result from atypical neural communication (van den Heuvel and Fornito, 2014), and several fMRI studies have identified altered anatomical neural connectivity in various brain regions (Allen et al., 2008;Crossley et al., 2016). A consequence of these neurocognitive alterations is the disability to acquire optimal adaptation for normal functioning with deficits in sensory integration (Heinrichs and Buchanan, 1988;Steinmann et al., 2019).
Many studies confirm that deficits in integration of sensory information in schizophrenia are apparent in visual, auditory and tactile modalities (e.g., de Gelder et al., 2003;Ferri et al., 2014;Vlcek et al., 2014). The impaired sensory perception is suggested to be related to passivity symptoms like hallucinations (Surguladze et al., 2001). The internal forward model is a widely accepted explanatory model of passivity symptoms in schizophrenia . It suggests that the characteristic passivity symptoms in schizophrenia derive from an inability to make accurate predictions about the perceptual or sensory outcome of their own intentional actions (Frith, 2005). Postmes et al. (2014) also argue that passivity symptoms in schizophrenia arise due to perceptual incoherence. They theorize that hallucination represents a coping mechanism to resolve incoherent multisensory experiences. Since sensory disturbances are apparent across several modalities and hallucinations are a multimodal phenomenon, Wallace et al. (2020) state that the understanding of these deficits will benefit from a multisensory perspective.
A growing number of studies have investigated MSI in schizophrenia with different paradigms and several of them present evidence of significant behavioral consequences of impaired audiovisual integration (e. g., de Jong et al., 2009;Ross et al., 2007;Stevenson et al., 2017). Facilitation effects on reaction times do not occur in schizophrenia patients in the same way as they do in healthy controls (Williams et al., 2010). In addition, patients experience the McGurk effect more rarely, which indicates less integration interference (Pearl et al., 2009). Findings manifest abnormalities in MSI with widened temporal binding window for individuals with schizophrenia (e.g., Foucher et al., 2007;Zhou et al., 2018), and impaired MSI appears to be most evident in speech-related audiovisual stimuli and worsened in a noisy environment (Wallace and Stevenson, 2014).
However, the results have not been consistent. The results of de Boer-Schellekens et al. (2014) indicate no deficits in MSI during tasks only engaging lower-level functions, supporting the notion that diminished sensitivity to visual temporal order inflicts on MSI rather than indicating deficits in MSI alone. de Gelder et al. (2003) found no difference in performance on simple MSI tasks between individuals with schizophrenia and healthy controls, but impairments in MSI were evident in schizophrenia during tasks involving speech. Williams et al. (2010) found impaired MSI using a basic audiovisual paradigm consisting of a simple detection task without any speech components. Still, some studies show no impairments in MSI (e.g., Aine et al., 2017).
In the meta-analysis of Tseng et al. (2015), it was concluded that impairments in MSI for non-emotional stimuli are evident in schizophrenia. Two studies reported increased multisensory facilitation effects with stimuli containing emotion-triggering aspects, for example a sad face or the sound of laughter (de Gelder et al., 2005;Van den Stock et al., 2011), while others report decreased integration of emotional stimuli (e. g., de Jong et al., 2009). Even though there are contradictory results, Tseng et al. (2015) lean towards suggesting impaired MSI for emotional stimuli as well.
It has been proposed that it is relevant to look at these neurocognitive impairments from a perspective of larger scale neural circuits and cortical networks rather than just focus on local neural areas or specific brain regions (Onitsuka et al., 2013). New models are needed to explain the neurological variations in schizophrenia based on higher function, neural connectivity and brain rhythms rather than just anatomy and basic functions. One established model concerning this is the so-called dysconnectivity hypothesis that suggests abnormal interaction between different brain regions, especially a fragmentation of frontotemporal connection, to be the reason for positive symptoms in schizophrenia (e.g., Friston et al., 2016).
Research on schizophrenia patients with electroencephalography (EEG) reveals aberrant oscillatory activity in several frequencies, including theta/delta bands (Boutros et al., 2008) and alpha/gamma bands (White et al., 2010). For instance, a decrease in synchronized gamma and beta power has been found when exposing the participants to different stimuli (Uhlhaas and Singer, 2013). When investigating neural oscillations in individuals with schizophrenia presented with auditory stimuli, several studies have found deficits in steady-state evoked potentials in gamma frequencies but also to some extent at lower frequency bands (e.g., Krishnan et al., 2009;Kwon et al., 1999). Deficits found in evoked oscillations when schizophrenia subjects processed visual information have been proposed to be a sign of reduced ability to arrange incoming sensory information accurately to oscillatory activity (e.g., Uhlhaas and Singer, 2010). Kopell et al. (2011) showed that beta-receptor rhythms may mediate how healthy individuals integrate information from different sensory modalities. Together with the suggestion that these beta-receptor rhythms are weakened in the schizophrenic brain (Pittman-Polletta et al., 2015), a possible connection between oscillation alterations in schizophrenia and MSI is put forth. Furthermore, impairment in oscillations could possibly provide explanations for suggested problems with functional connectivity of cortical networks in schizophrenia (Uhlhaas and Singer, 2010).

Investigation rationale
MSI is associated with specific neural correlates, brain regions and oscillation rhythms (e.g., Cappe et al., 2012b;Keil and Senkowski, 2018). Contemporary behavioral studies indicate impairment in MSI for the schizophrenia population (Tseng et al., 2015), and multisensory impairments are evident in brain-imaging studies (e.g., Jardri et al., 2009;Silbersweig et al., 1995). It is relevant to continue to map disorder-related brain activity while processing multisensory stimuli in order to understand how the pathogenesis of schizophrenia hinders the afflicted from processing incoming sensory information in a coherent and cohesive way (Mäntylä et al., 2018). A greater understanding of how sensory processes are impacted in the brain of the schizophrenia population could be of use in treatment, and several studies suggest that new ways to help diagnose schizophrenia in the future are to be found in different types of brain imaging (e.g., Boutros et al., 2008;Kambeitz et al., 2015;Salvador et al., 2019). Because different brain-imaging techniques can shed further light on how MSI is expressed in schizophrenia, it is relevant to compile the existing literature in this field. It can also be considered that the most valuable studies are those directly comparing how the neural correlates during MSI differ between the population with schizophrenia and the healthy population. Accordingly, the aim of the present study was to perform a systematic review of this research with such direct comparisons of neural activity and connectivity. To our knowledge, there is no previous systematic review addressing this.

Eligibility criteria
The target population was individuals diagnosed with schizophrenia or schizoaffective disorder (SP) according to the Diagnostic and Statistical Manual of Mental Disorders (DSM) or the International Statistical Classification of Diseases and Related Health Problems (ICD). The control group consisted of healthy individuals without current acute, severe or chronic disease. These healthy controls (HC) constitute the comparative reference on multisensory tests.
We selected a broad definition of multisensory tests based on MSI research to be able to include as many relevant articles as possible. If the study used an acceptable MSI paradigm, where two different sensory modalities were examined at the same time, it was included for further full-article assessment. All different sensory modalities were included (e. g., visual, auditory, olfactory, tactile, and taste). The outcome had to be measured with some type of brain-imaging technique (e.g., EEG, magnetoencephalography (MEG), fMRI).
Studies were included only if peer-reviewed, original articles in English, and accessible to the authors. Studies were excluded if they included a target population with mixed diagnoses, for example schizophrenia and/or schizoaffective disorder combined with other participants with bipolar or psychotic disorders.

Search strategy and selection process
The review followed the Preferred Reporting Items for Systematic Reviews and Meta-analysis (PRISMA), and as suggested by PRISMA (Moher et al., 2009(Moher et al., , 2015 the selection phase in this systematic review consisted of four different stages (Fig. 1). First, articles were identified through a database search. Second, the articles with duplicates removed were screened according to the selection criteria concerning title plus abstract. Third, the eligibility of each full-text article was rated according to predetermined criteria. Finally, approved articles were included in the systematic review.
The database search was performed in several different databases: PsychInfo, PubMed and Web of Science. Keywords from relevant articles and reviews were chosen as search terms. Search terms used were ("multisensory" OR "multi-sensory" OR "multimodal" OR "crossmodal") AND ("integration") AND ("neural" OR "cortical" OR "ERP" OR "fMRI" OR "EEG" OR "brain" OR "PET" OR "MEG" OR "oscillations") AND ("schizophrenia" OR "schizo*" OR "schizoaffective" OR "psychosis"). Only peer-reviewed, original articles in English were included and records were searched from the earliest available until January 2020. In total, 317 records were identified through the database search. Two of the authors made the selection of articles, and disagreements were resolved by discussion between all three authors.
Identification was followed by screening in which duplicates were removed and titles plus abstracts were screened to check whether they met inclusion criteria. The reference lists from included articles were also searched and their relevant abstracts were screened for inclusion. Articles that did not meet the inclusion criteria were excluded (e.g., only unisensory or not clearly multisensory, no MSI task, systematic reviews and meta-analysis, animal studies, no diagnosis, no control group, did not measure outcome with brain-imaging techniques or inaccessible). Twenty-nine articles were assessed for eligibility. Two reviewers checked the eligibility of these full-text articles and inclusions of articles were based on consensus. Twelve additional articles were finally excluded. Three articles did not use DSM or ICD for diagnosis and two articles only included first time psychosis. Seven articles were excluded for several reasons, including the actual MSI task not studied with brain imaging (though other tasks were, n = 1); stimuli not presented at the same time and therefore not classified as MSI task (n = 2); primarily concerned other cognitive functions than MSI leading to difficulties in interpreting the results (n = 2); not a primary source (n = 1); and not describing any multisensory results (n = 1). Thus, 17 articles were included in the systematic review (see Fig. 1).

Quality assessment
The Newcastle-Ottawa Quality Assessment Scale (NOS; Wells et al., 2012) was used to assess the risk of bias in the results of the 17 included articles (see Appendix 1). The NOS comprises eight points or items grouped into three different sections, which are comparability, selection, and exposure. Each item was rated for a maximal score of one star and the maximal total score was eight. Like other systematic reviews, articles rated less than five out of eight were excluded (e.g., de Dieuleveult et al., 2017). The NOS was chosen since it is often used for systematic reviews and it is considered credible by the authors. Two authors performed the quality assessment of the selected studies, and there were no disagreements.

Data extraction and analysis of results
The collected studies were primarily grouped based on what brainimaging technique had been used. Key characteristics in the data were described according to first author, title, publication year, number of participants, cognitive tests, participant characteristics, experiments, brain-imaging analysis, brain-imaging results, and performance when appropriate (see Appendix 2). Two authors made the data extraction and there were no disagreements.
For the analysis of results, the studies were grouped into social or non-social based on type of experiment and into fMRI and EEG/MEG based on brain-imaging technique. The results of how the neural correlates during MSI differ between the schizophrenia population and the healthy population were summarized for each brain-imaging technique.
Since the experimental designs differed, a table (see Table 1) was created that presents the studies according to different conditions compared, brain-imaging results, performance results, unisensory processing, and brain regions.

Quality assessment
The included studies all proved good quality according to the NOS (see Appendix 1). Ten studies had eight out of eight stars, four studies had seven stars and three studies had six stars. Thus, none of the studies had less than five stars and all 17 studies remained included.

Experimental design
The studies were sorted according to brain-imaging technique, fMRI (n = 7) and EEG (n = 10), and type of experiment, non-social (n = 7) and social (n = 10). The included studies compared unisensory to multisensory conditions (n = 5) or different multisensory conditions with one another (e.g., congruent vs incongruent, n = 5) or a combination of these (n = 7). All included studies used audiovisual stimuli only (see Table 1).
Experiments were considered non-social if they did not include a task concerning multisensory stimuli that consisted of speech, faces or other social stimuli (n = 7). Of the seven studies labeled as non-social, two used a simple multisensory detection task (Hanlon et al., 2016;Wynn et al., 2014). One study only included the instruction to "look and listen" during presentation of multisensory stimuli (Braus et al., 2002). Four studies involved audiovisual discrimination tasks Sanfratello et al., 2018;Stone et al., 2011Stone et al., , 2014, two of which included manipulations of synchronization of audiovisual stimuli (Sanfratello et al., 2018;Stone et al., 2014). As mentioned in the introduction, it has been shown that visual deficits in SP are associated with deficits in the dorsal stream (i.e., "the where pathway" in the parietal lobe). By placing a multisensory stimulus in the peripheral view, which is the near condition in these experiments, the dorsal stream is engaged and the multisensory processing in the pathway can be recorded (Sanfratello et al., 2018;Stone et al., 2011Stone et al., , 2014. Sanfratello et al. (2018) and Stone et al. (2014) performed different analyses of the same dataset. In other words, they did not perform independent experiments but instead analyzed the same data.
Experiments were considered social if they included a task concerning social multisensory stimuli, like faces, voices or speech perception (n = 10). Three of the studies labeled as social used the McGurk paradigm or a version of it with some sort of discrimination task (Liu et al., 2016;Roa Romero et al., 2016a,b) or no specific task (Stekelenburg et al., 2013). One study had an emotional discrimination task based on face-voice stimuli (Müller et al., 2014). Two studies used an audiovisual speech perception paradigm including detection of semantic categories (Szycik et al., 2009(Szycik et al., , 2013. Three studies investigated multisensory stimuli in the form of speech and gesture of which one used a content judgement task (Wroblewski et al., 2020) and two had a simple task for attentional purposes only (Straube et al., , 2014. Szycik et al. (2009Szycik et al. ( , 2013 derive from the same dataset, as well as the two studies of Straube et al. (2013Straube et al. ( , 2014.

Participants
The total number of participants was evenly distributed between the group from the target population and the control group (SP: n = 282 and HC: n = 284).
Five of the studies also included participants diagnosed with schizoaffective disorder (Hanlon et al., 2016;Sanfratello et al., 2018;Stekelenburg et al., 2013;Stone et al., 2014;Wroblewski et al., 2020). In three of these, the exact number of participants with schizoaffective disorder was not reported. However, in the two studies where the distribution is mentioned (Hanlon et al., 2016;Stekelenburg et al., 2013) the participants diagnosed with schizoaffective disorder are few in relation to the participants diagnosed with schizophrenia (SP-AFF: n = 3 and SP: n = 32).

Performance, response times and accuracy
Four studies reported multisensory performance deficits in SP due to lower response accuracy for SP than HC in some conditions (Roa Romero et al., 2016a;Stone et al., 2011) or across conditions (Stone et al., 2014;Wroblewski et al., 2020). Two studies reported longer RTs for SP than HC (Roa Romero et al., 2016a;Straube et al., 2013). Two articles reported greater multisensory facilitation in SP compared to HC when comparing visual unisensory trials with multisensory trials (Stone et al., Table 1 Overview of results. Articles are only presented once for each head category (exception brain regions), and articles that fit into more than one head category are sorted according to their primary results. Participants that are included in several studies (when performing different analysis of the same data) are only counted once in the  2011,2014). Eight articles did not find any significant group differences or no indications of group differences in response times (Liu et al., 2016;Müller et al., 2014;Wynn et al., 2014), accuracy (Hanlon et al., 2016;Szycik et al., 2009Szycik et al., , 2013 or perceived illusions Roa Romero et al., 2016b). Four articles did not report any performance data (Braus et al., 2002;Sanfratello et al., 2018;Stekelenburg et al., 2013;Straube et al., 2014).

Correlations with clinical measures
Several studies examined correlations between EEG, MEG or fMRI results and clinical measures. The clinical measures included Positive and Negative Syndrome Scale (PANSS), Scale for the Assessment of Negative Symptoms (SANS), Scale for the Assessment of Positive Symptoms (SAPS), Brief Psychiatric Rating Scale (BPRS), evaluation of cognitive domains with Measurement and Treatment Research to Improve Cognition in Schizophrenia Consensus Cognitive Battery (MATRICS), and medication dose. The outlines of these correlations are presented below. For PANSS, SANS, SAPS and BPRS a mixture of positive, negative and no correlations were found. Two studies found negative correlations, between positive symptoms and multisensory amplitude in the left intraparietal sulcus (IPS; Sanfratello et al., 2018) and between positive symptoms and connectivity to the left inferior frontal gyrus (IFG; Straube et al., 2014). In line with these, Müller et al. (2014) reported negative correlations between P100 amplitude and total BPRS scores. For all three, decreased amplitude or connectivity was associated with more severe symptoms. Wroblewski et al. (2020) found a negative correlation between middle temporal gyrus to superior temporal sulcus (MTG-STS) connectivity and the SANS attention subscale, indicating that decreased connectivity was associated with more severe symptoms. Furthermore, SANS avolition showed positive correlation with RT to auditory, visual and audiovisual stimuli, respectively, and negative correlation with hit rate (HR) to audiovisual stimuli. That is, more severe apathy was associated with longer RTs and lower HRs (Wynn et al., 2014). Several studies did not find any significant correlations with PANSS (Hanlon et al., 2016;Roa Romero et al., 2016a,b;Stone et al., 2011), PANSS Negative (Sanfratello et al., 2018;Straube et al., 2014) or BPRS (Braus et al., 2002;Stekelenburg et al., 2013;Wynn et al., 2014). (A number of non-significant correlations are left out here since it is out of scope for this systematic review.) Stone et al. (2014) reported that the relationship between gamma-band power and MATRICS scores was different for SP compared to HC. For example, correlations between multisensory gamma-band power and MATRICS scores were only found in HC.
Some of the included publications investigated the relationship between their results and medication dosage with the majority reporting no significant correlations Liu et al., 2016;Müller et al., 2014;Roa Romero et al., 2016a;Sanfratello et al., 2018;Stekelenburg et al., 2013;Szycik et al., 2013). The exceptions are positive correlation between dosage and response rate in Stone et al. (2011), and positive correlation between dosage and gamma-band power in Stone et al. (2014).

EEG and MEG findings
Ten studies recorded neural activity with EEG and MEG, including various experimental designs and aims. Group differences between SP and HC in amplitudes and latencies were evident in a majority of studies. It was mainly reported that SP had reduced amplitudes and/or aberrant latencies compared to HC, indicating deficits in MSI Sanfratello et al., 2018;Stone et al., 2011). One study found aberrant neural activity in early event-related potential (ERP) amplitudes and latencies, while late ERPs were only reduced to visual but not multisensory stimuli (Liu et al., 2016). Applying the principles of sub-and superadditivity (i.e., multisensory ERP minus the sum of unisensory ERPs), Stone et al. (2011) found enhanced facilitation in SP while two studies did not find any group differences (Liu et al., 2016;Wynn et al., 2014). Two studies found no group differences in global field power amplitude, a measure only reflecting effects simultaneously present in a large number of electrodes (Roa Romero et al., 2016a,b).
Oscillatory activity (i.e., theta, gamma and beta-band power) was found to be reduced in Balz et al. (2016) and Roa Romero et al. (2016a), and alpha-band power more strongly suppressed in Roa Romero et al. (2016b). In one study, both increases and decreases were found, mainly concerning greater gamma-band power in SP (Stone et al., 2014). However, this was due to failure in suppressing oscillatory power in all but one condition (i.e., gamma-band power was stronger than both baseline and HC in only one condition). This was the so-called near condition, where stimuli presented in the peripheral view were meant to activate the visual dorsal stream.
When comparing congruent to incongruent conditions of McGurk illusion trials or unisensory to multisensory trials, various interaction effects were found. SP had the opposite neural activity pattern compared to HC (Roa Romero et al., 2016b;Stekelenburg et al., 2013) or diminished differences between conditions (Liu et al., 2016;Roa Romero et al., 2016a). For example, HC displayed larger amplitudes in the congruent trials whereas SP displayed larger amplitudes in the incongruent trials, indicating that SP could not benefit from congruent multisensory information. Müller et al. (2014), on the other hand, only found reduced amplitudes for SP during emotional incongruent trials but no group differences during emotional congruent trials. This indicates that congruent sounds can affect responses and lead to neural responses similar to HC for emotional face-voice stimuli. However, the results from Müller et al. (2014) do not support the notion of SP having altered audiovisual integration since no difference was found between incongruent and congruent stimuli within the group. Stekelenburg et al. (2013) focused on phonological predictions based on temporal and content information. The authors found that visual temporal information about sound onset did not have the suppression and/or speed-up effect on N100 for SP as it had for HC. These findings corresponded to group differences in multisensory integration. Roa Romero et al. (2016a) investigated crossmodal prediction error (PE) processing when recording theta-band oscillations and found interaction effects indicating that SP had impairments in resolution and in updating of violated predictions. These two studies indicate impairments in prediction processes during MSI in SP.
Three studies investigated multisensory deficits in the visual dorsal stream, where two studies derive from the same dataset (Sanfratello et al., 2018;Stone et al., 2014). Results indicate deficits in the visual dorsal stream during multisensory processing in SP (Sanfratello et al., 2018;Stone et al., 2014). In line with these results, Liu et al. (2016) reported reduced amplitudes in response to face stimuli in multisensory conditions, indicating that impairments in face processing are also evident in face-voice integration.
Five studies located group differences in multisensory processing in parietal and occipital regions Liu et al., 2016;Müller et al., 2014;Sanfratello et al., 2018;Stone et al., 2011), and four studies located differences in frontal/temporal regions (Roa Romero et al., 2016a,b;Stekelenburg et al., 2013;Stone et al., 2014). Source estimations using a linear distributed inverse solution based on a Local Auto-Regressive Average (LAURA) in Stekelenburg et al. (2013) revealed several abnormalities (i.e., under activation or displaced activation) in neural networks of audiovisual integration. These networks involved auditory cortex, superior temporal gyrus, middle temporal gyrus, and inferior frontal gyrus.

fMRI findings
There are some findings of subcortical abnormalities (Braus et al., 2002;Szycik et al., 2009) and alterations in cortical areas were discovered in all of the fMRI-studies. The neural abnormalities in the cortical areas were predominantly focused around the STS (Straube et al., , 2014Szycik et al., 2013;Wroblewski et al., 2020); the STG (Hanlon et al., 2016;Szycik et al., 2009); the IFG (Straube et al., , 2014Szycik et al., 2009Szycik et al., , 2013Wroblewski et al., 2020); the fusiform gyrus (FG; Szycik et al., 2009;Wroblewski et al., 2020); and the visual dorsal stream (Braus et al., 2002;Hanlon et al., 2016). Two articles found reduced connectivity in both STS and IFG for SP during audiovisual trials (Straube et al., 2014;Szycik et al., 2013), whereas one found reduced connectivity only in the MTG/STS region (Wroblewski et al., 2020) and another suggested dysfunction only in the left IFG . Yet, it is important to add that the decreased connectivity in the left IFG was mainly found in incongruent trials in Szycik et al. (2013), a condition that was not part of the experiment made by Wroblewski et al. (2020). This could possibly be connected to the results with dysfunctions in abstract, but not concrete, multisensory information processing found in STS, bilateral IFG (Straube et al., 2014), and the left IFG .
Overall, there are tendencies of lower responsiveness and less activation in the SP group (Braus et al., 2002;Hanlon et al., 2016;Straube et al., 2013Straube et al., , 2014Szycik et al., 2009Szycik et al., , 2013. In three studies, however, the impairments are not as distinct as expected, with many similarities between HC and SP (Straube et al., , 2014Wroblewski et al., 2020).
Two articles report that impairments in audiovisual integration originate from dysconnectivity in both the STS and the IFG, especially when SP is exposed to incongruent stimuli (Szycik et al., 2013) or when integrating stimuli with abstract content (Straube et al., 2014). Moreover, abnormal MSI processing is explained to derive from deficits in inhibition (Hanlon et al., 2016); impairments in the verbal pathway (Wroblewski et al., 2020); defected speech motor system in the right hemisphere and decreased lateralization of speech functions to the left hemisphere (Szycik et al., 2009); alterations in cortical as well as subcortical activation patterns, mainly focused around high-order frontoparietal cortex and thalamus (Braus et al., 2002); and a failure to activate the left IFG and posterior MTG . In addition, Straube et al. (2014) suggest that dysconnectivity of left STS and prefrontal cortex could be associated with failure to utilize the hemispheric functions adequately in a context-dependent manner.

Main findings
The main findings are summarized and presented in Table 2. In short, reduced neural activity and additional aberrant neural responsepatterns were evident in SP during MSI tasks in both fMRI and EEG/ MEG studies, and distributed across several brain regions and networks. Furthermore, no difference in outcome was ascertained between social and non-social tasks (see Table 1).
Summary of reduced activity in neural correlates during MSI tasks: • Reduced amplitudes in occipital/parietal ERPs during MSI tasks were revealed when using EEG or MEG Liu et al., 2016;Sanfratello et al., 2018;Stone et al., 2011). • Reduced oscillatory activity was found in gamma/beta-band power in occipital/parietal regions ; in theta-band power in frontal regions (Roa Romero et al., 2016a); and stronger suppression in alpha-band oscillations was observed (Roa Romero et al., 2016b). • Reduced fMRI-activation was seen both as an overall pattern (Szycik et al., 2009) and in specific brain regions (Braus et al., 2002;Hanlon et al., 2016), for example bilaterally in secondary auditory regions, the default-mode network and the dorsal visual pathway, and in the right thalamus and prefrontal cortex. In addition, reduced activation was visible during MSI tasks with an abstract content . • Reduced connectivity was found in all conditions between STS and frontal cortex (Straube et al., 2014), as well as for the STS and MTG (Wroblewski et al., 2020), and during congruent stimuli for left IFG to a range of different brain regions (Szycik et al., 2013).
Summary of aberrant neural activity during MSI tasks: • Aberrant neural processes (i.e., amplitudes, oscillations, activity and connectivity) were made evident by various interaction effects where SP often demonstrated contrasting response patterns compared to Table 2 Main findings in brain-imaging results showing only results reflecting group differences and/or interaction effects in activity, connectivity, oscillations, amplitudes and latencies during multisensory-task recordings. The number of studies is indicated by n.
fMRI EEG

Discussion
The research on MSI in schizophrenia recorded with EEG, MEG and fMRI is still a relatively unexplored, but growing, research topic. In this review, seventeen studies were included and their results lead us to two main insights. First, the activity in neural correlates during MSI is aberrant and mainly reduced in SP compared to HC. Second, these differences are distributed across several brain regions and networks associated with sensory and multisensory processes. However, we could not find any clear connections between experimental design, brainimagining technique and results in neural activity, making it difficult to draw any further conclusions about how neural and performance outcomes may be modified by experimental design and brain-imaging technique.

Aberrant and reduced neural responses in SP
In comparing SP to HC, thirteen articles reported significantly reduced neural responses in SP whereas only three reported no differences and one enhanced facilitation in SP. The differences were either due to reduced activity, connectivity, oscillations and ERPs and/or various interaction effects in a majority of the studies. These results are mostly interpreted as altered audiovisual integration at the neural level in SP, which might reflect deficits or anomalies in MSI on performance level (e.g., Balz et al., 2016;Roa Romero et al., 2016b;Sanfratello et al., 2018;Szycik et al., 2009). It is also suggested that SP lack the facilitating or enhancement effect of congruent multisensory information seen in HC (e.g., Roa Romero et al., 2016a,b;Stekelenburg et al., 2013;Szycik et al., 2009Szycik et al., , 2013. This means that healthy individuals have different neural responses to congruent compared to incongruent audiovisual information, which is helpful when interpreting and responding to the outside world. The lack of difference in neural response in SP might indicate difficulties both in integrating congruent stimuli and in differentiating between incongruent stimuli. Consequently, the perceptual benefits of MSI are impaired for SP. Aberrant neural activity has been associated with performance deficits in MSI in previous research (Magnée et al., 2009;Sass et al., 2013; for reviews see also Keil and Senkowski, 2018, Tseng et al., 2015, and Wallace et al., 2020. The inability to respond appropriately to audiovisual information could be due to impairments in prediction processing, such as deficits in making predictions about sounds from visual information (Stekelenburg et al., 2013), and/or impairments in prediction error processing (Roa Romero et al., 2016a). This is in line with the integrative framework that Keil and Senkowski (2018) present, in which they theorize that SP have deficits in generating intersensory predictions and evaluating stimulus congruence as well as deficits in the resolution of incongruence. The internal forward model system illustrates how impairments in prediction error processing can cause positive symptoms like hallucinations in SP. In healthy individuals, because of sensory prediction the response to the sound of their own voice is attenuated in auditory cortex when speaking. This attenuation is not evident in SP, especially for those who report auditory hallucinations (Frith, 2005).
However, the dataset in this review points in several directions and some studies did not find any group differences in neural activity. Stone et al. (2011) even found that multisensory facilitation was enhanced in the SP group. In Liu et al. (2016) aberrant neural activity was found only in early ERPs while late ERPs were only reduced to visual but not multisensory stimuli. Müller et al. (2014) compared the emotional rating of congruent to incongruent face-voice stimuli and found deficits only in the incongruent condition. They concluded that their findings demonstrate that early ERPs (i.e., P100 amplitude) can be affected by congruent sound, but only when the stimuli have emotional valence and not when neutral. Furthermore, Wynn et al. (2014) did not find any difference between groups in ERPs, and Straube et al. (2013Straube et al. ( , 2014 and Wroblewski et al., 2020 found that SP did not differ from HC in brain activity or connectivity during MSI tasks involving co-verbal gestures with concrete content. Taken together, the studies suggest aberrant and reduced neural activity reflecting deficits in multisensory integration. However, at least under some circumstances, SP seem to have intact MSI with the ability to benefit from audiovisual cues, and maybe even compensate for deficits in visual processing during congruent multisensory stimuli.

Brain regions and networks associated with deficits in MSI in SP
There are multiple brain regions, both cortical and subcortical, found to be impaired in the different articles included in this review. Our results, however, indicate that the brain regions with aberrant processing during MSI in SP are the STS, the STG, the FG, the MTG and the IFG. Moreover, deficits were seen in the visual dorsal stream, with impairments mainly visible in the IFG and the STS. This is comparable to previous findings, as aberrant processing in SP has been found previously in the IFG (Arce et al., 2006), the MTG (Surguladze et al., 2006), the FG (Onitsuka et al., 2003) and the STG (Barta et al., 1990). The STS is not frequently reported as a region normally associated with the neural deficits of the SP population. However, there are studies showing a dysfunctional pattern in most parts of the brain, and describing the deficits in larger regions (i.e., temporal lobe/cortex) makes it harder to distinguish if the STS is affected or not (Shenton et al., 2001). On the other hand, the STS is a region that has been found to be very important for MSI processing (Desimone and Gross, 1979). There is a possibility that the STS function is mainly reduced in SP when responding to MSI stimuli, which could explain why this is not a region typically found to be impaired.
There are indications that the dorsal visual stream is associated with dysfunction in SP during MSI in the present review (e.g., Sanfratello et al., 2018). Reduced activity in the extrastriate visual cortex, which is part of the dorsal visual stream, has previously been seen in the SP population during perceptual organization of visual information (Silverstein and Keane, 2011). Other articles also indicate deficits in the dorsal visual stream for SP (e.g., Butler and Javitt, 2005), seemingly involved during MSI (Kaposvari et al., 2015). This could suggest that these dysfunctions are due to aberrant unisensory processing affecting MSI rather than aberrant multisensory processing in itself. However, in Stone et al. (2014), deficits were seen in the dorsal visual stream for unisensory visual stimuli besides the multisensory deficits, and the authors analyzed whether the unisensory deficits could be directly mapped onto the changes in MSI response. They could not, which indicates that the aberrant processing in the visual dorsal stream during MSI does not emerge directly from deficits in unisensory processing.
In some articles in this systematic review the impairments are located in particular in the left IFG (e.g., Szycik et al., 2013), a region also known as Broca's area. One study finds deficits only in the left STS (Straube et al., 2014), which is part of Wernicke's area. Both Broca's area and Wernicke's area are strongly associated with speech functions and language ability (Kolb and Whishaw, 2015). The articles in our systematic review showing deficits in these areas all use social multisensory stimuli accompanied with speech. Furthermore, as mentioned in the introduction, the STS bilaterally is thought to be involved in social functions such as both speech perception and biological motion (Calvert, 2001). This agrees with our findings, since the disturbances in the STS are only visible in the studies using gestures or lip movements as part of the multisensory stimuli. This could indicate that these deficits are more strongly associated with speech perception or language comprehension, or perhaps social visual information, rather than multisensory processing. On the other hand, also mentioned in the introduction, the STS is an important region for multisensory processing. In addition, the left IFG, together with its right homolog, has proved to be of importance specifically for audiovisual integration of speech in healthy individuals (Curcic-Blake et al., 2013;Ojanen et al., 2005;Pekkola et al., 2005). This makes it reasonable to believe that our results do indicate reduced MSI.
Several of the articles find deficits in the IFG bilaterally (e.g., Szycik et al., 2009). The right IFG also takes a great part in functions involving attention and inhibition (Hampshire et al., 2010), and both attention and inhibition are known to be impaired in SP (e.g., Arce et al., 2006;Nuechterlein et al., 2015). However, when using experiments where the paradigm is designed to test automatic audiovisual integration processes, not dependent on attention, deficits are still visible for SP (de Gelder et al., 2003). Regarding inhibition, Hampshire et al. (2010) argue that the right IFG is more likely to be part of a network that tunes in on task-relevant stimuli rather than alone responsible for inhibitory control. They further conclude that the right IFG has a far greater role than just inhibition. Our systematic review opens up to the possibility that this greater role could involve multisensory processing.
Three areas additionally observed to be involved in MSI are the FG, the MTG and the STG (Pehrs et al., 2014;Surguladze et al., 2006), which all have been indicated to be impaired in this systematic review. All of these are also found to be associated with proneness to hallucinations and the production of them (Kim et al., 2003;Kunzelmann et al., 2019;Zhang et al., 2017). As previously mentioned, the link between multisensory disturbances and symptoms like hallucinations or distorted reality is commonplace (Frith, 2005). The positive symptoms in schizophrenia are often explained with the dysconnectivity hypothesis (e.g., Friston et al., 2016), and dysconnectivity was found in several of the articles. They included dysconnectivity both in and between multiple brain regions throughout all of the lobes, including dysconnectivity in the parietal cortex. Moreover, decreased parietal connectivity has been demonstrated to lead to decreased MSI facilitation (Brang et al., 2013). Possibly, there is a link between the dysconnectivity typical of the positive symptoms in SP and the evident dysconnectivity seen with decreases in multisensory facilitation. The correlations with clinical measures found in this systematic review further indicate this link as positive symptoms were negatively correlated to MSI amplitudes in one article (Sanfratello et al., 2018) and to connectivity in IFG in another (Straube et al., 2014). Furthermore, reduced neural activity was associated with more severe symptoms in several studies (e.g., Müller et al., 2014;Wroblewski et al., 2020).

Strengths and limitations
This systematic review followed the PRISMA guidelines in the aspiration to ensure a fair and reliable selection process and to improve the quality of the systematic review (Moher et al., 2009(Moher et al., , 2015. The literature search was made in the three databases PubMed, PsycInfo and Web of Science with multiple search terms and a broad definition of MSI, and all possible brain-imaging techniques were included. Furthermore, all included articles had to meet pre-determined quality standards according to the NOS (Wells et al., 2012). Two reviewers checked the eligibility of the 29 full-text articles that emerged from the first screening, and the ultimate inclusion of 17 articles was based on consensus. Even though these measures were taken to find as many relevant articles as possible, some articles investigating MSI in schizophrenia with brain-imaging techniques might not be included. This could be due to search limitations such as time constraints, overlooked relevant search terms, and/or articles published after the search period. It could also be due to researchers not labeling their experiments as MSIrelated even though they might be. It is possible that the unintended exclusion of relevant search terms could have affected our findings. One possible way to address this in future reviews could be to add search terms like "sensory", "illusion" and "speech" so that articles not labeled as MSI-related could be included in the initial search process. The definition of MSI chosen by the authors, also affecting the inclusion process, could be categorized as too wide or too narrow compared to other definitions. Since the identification of articles in the three databases was divided between the two reviewers, there is a risk of bias in the selection process. If the reviewers had performed the identification and screening of articles in all three databases individually to make sure the same articles were included for assessment of eligibility, it would have decreased the risk of bias and the risk of missing relevant articles.
Since this review focuses on an emerging research field, it is based on a rather limited sample with 17 included articles. Furthermore, because the dataset is the same for some of the articles the number of included participants is even smaller than what perhaps could be expected (SP: n = 282; HC: n = 284). There are no optimal numbers of included articles in a systematic review and the number of relevant articles is highly dependent on the size of the research field. However, with 17 included studies the risk of overlooking relevant articles is that just one article could affect the results and conclusions of the systematic review, whether they point in the same direction or not. Furthermore, a limited number of participants increase the risk of making a type II error (i.e., not finding a difference that would be visible in a larger sample). This risk is even greater with greater variance in group characteristics. Even though all groups from the included articles were evaluated within the NOS (Wells et al., 2012), and considered equal, there is still a possibility for some variance. For example, one study included neuroleptic-naïve first-episode schizophrenic patients while others included patients with illness durations up to 16 years.
A difficulty in this systematic review has been to compile and interpret the included studies that are quite heterogeneous. The experimental designs differ greatly due to different focuses in aim. For example, some of the studies focus on abstract thinking ability rather than MSI per se, some compare unisensory to multisensory conditions and others incongruent versus congruent. Therefore, there is a possibility that the aberrant neural activity is better explained by differences in experimental design, for example complexity of the task or if the task was social or non-social. However, we could not find any connections between results in neural activity and type of task. In our results, both social tasks and non-social tasks were associated with differences in neural processing between SP and HC. This further strengthens the assumption that MSI processing is aberrant in SP, since it cannot be explained with the tasks having social elements. Still, it is possible that there are connections in more detailed measures between different tasks and activity or connectivity, but that this is not evident in the present review. Furthermore, it could be that the study sample is too small and too versatile to find underlying experimental factors behind the outcomes. There is a possibility that these differences will become evident when more research is available. For example, underlying factors could be how MSI is defined and computed, differences in stimulus characteristics and/or analytic methods, and how conditions are compared.
A further limitation is that schizophrenia is a heterogeneous disorder (Mohr et al., 2004). For example, not all individuals diagnosed with schizophrenia show catatonic behavior, even though it is a symptom of the disease. Therefore, it is a possibility that the inconsistencies showing in this systematic review are a consequence of that heterogeneity. It could be that not all individuals with schizophrenia have impairments in MSI, at least not to the same extent. Furthermore, some of the individuals with schizophrenia could have found ways to compensate. Due to the differences in design, we cannot draw any conclusions concerning under what specific conditions MSI is affected in schizophrenia or its behavioral manifestation. However, the combined experimental designs in these studies mirror how people come across MSI in their everyday life, increasing external validity and the possibility to generalize results. The variety in group characteristics and the range of different types of design are both representative of reality, clarifying that the differences we can see from this systematic review are evident for several types of conditions and across groups.
The aim of this systematic review was to compile and analyze evidence of MSI impairments in schizophrenia using performance measures. However, although aberrant activity was a prominent finding in this review, it was not clearly reflected in the performance measures. Five out of the thirteen articles that reported performance measures found impairments in response accuracy, RTs or both (Roa Romero et al., 2016a;Stone et al., 2011Stone et al., , 2014Straube et al., 2013;Wroblewski et al., 2020). In six studies, differences in neural activity were found, but no difference in RTs, accuracy or perceived illusions Hanlon et al., 2016;Liu et al., 2016;Müller et al., 2014;Roa Romero et al., 2016b;Szycik et al., 2009Szycik et al., , 2013. (Note that Szycik et al., 2013, report the same performance results as Szycik et al., 2009, and that these two are counted here as a single study.) Even though behavioral performance was not the focus of this review, the authors believe it is important to reflect upon this result. It is possible to argue that it affects the validity and importance of the findings negatively. What conclusions can we draw from differences in neural activity between SP and HC when these are not reflected in expected behavioral differences?
In this systematic review, all different sensory modalities were included in the search process, but only audiovisual studies were included. This indicates that there is a majority of studies on neural correlates during audiovisual integration, and that the integration of other senses might be somewhat neglected in the empirical research. As mentioned in the introduction, it is well known that schizophrenia is associated with impairments in several sensory modalities. To understand the impairments in MSI fully, the associated neural correlates and clinical consequences, it is important to investigate other sensory modalities (e.g., tactile, olfactory, taste). Additionally, it is important to investigate impairments in multisensory integration of bodily signals, somatosensory input, especially since somatosensory impairments have been theorized to explain the "self-disorders" in schizophrenia (Postmes et al., 2014). Self-disorder is the difficulty to distinguish between self and other, which may cause passivity symptoms characteristic of schizophrenia . Multisensory impairments probably affect both environmental and bodily sensory signals in schizophrenia.

Conclusions and further directions
This is the first systematic review examining how the neural correlates during multisensory integration differ between individuals with schizophrenia and the healthy population. We found aberrant and reduced neural activity measured with EEG, MEG and fMRI, presumably reflecting deficits in multisensory integration in schizophrenia. This was evident in several brain regions involving multisensory integration, mainly in the temporal cortex (i.e., STS/STG, MTG and FG) as well as in frontal (i.e., IFG) and occipito-parietal (i.e., the dorsal stream) regions. However, results indicate that MSI could be intact during some conditions. Furthermore, difference in neural activity was not always reflected in performance deficits.
Further research could investigate the possibility that MSI is reduced in schizophrenia patients only during some conditions to find out the underlying mechanism(s) triggered by those specific conditions. Since this systematic review did not see any clear differences in neural correlates in schizophrenia patients when performing an audiovisual task categorized as social instead of non-social, it would be interesting to investigate this further. Moreover, clarifying the role of other sensory modalities like the tactile modality, and the integration of somatosensory input in schizophrenic self-disorders seem justified. The relationship between deficits in unisensory and multisensory processing is unclear and future research ought to continue to determine how they are related. If some individuals are able to compensate for deficits, it would be beneficial for others who suffer impairments in MSI if future research would investigate this topic further.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment
A version of the present study was presented in a Master's thesis (Norgren and Gröhn, 2020), with only abstract published, at Karlstad University, Karlstad, Sweden.