Introduction

In this study, we propose that interaction analysis is based on a set of theories about the human body and embodied interaction that are manifested in various professional methods of seeing (Goodwin 1994; Goodwin and Goodwin 1996). The field surrounding the microanalysis of video-recorded interactions involves various methodological practices—ways of professional vision—or “socially organized ways of seeing and understanding events that are answerable to the distinctive interests of a particular social group” (Goodwin 1994: 606). We still know relatively little about how selecting a specific type of professional vision influences the way in which the social behavior of participants in video data can be seen. In addition, the way in which these modes of professional vision are locally deployed and made sense of by the researchers in their moment-by-moment interaction is still widely under-elaborated.

Recently, there has been a broader tendency in the humanities and interaction research to focus on bodies in interaction (Deppermann and Streeck 2018; Nevile 2015), turning the analytical focus on multimodal (Mondada 2014, 2019a, 2019b), multisensorial (Goodwin and Cekaite 2018; Mondada 2019a) and intercorporeal (Cekaite 2010; Katila 2018a; Meyer et al. 2017; Streeck 2013) aspects of interaction. Accordingly, microanalytic studies have started to pay more attention to previously neglected forms of embodied behavior, such as touch, affect, and intimacy (Katila 2018a, b; Goodwin 2017; Goodwin and Cekaite 2018).

This increased microanalytic focus on bodies and on embodied interaction makes relevant a reflection on the large variety of analytical practices in the field. While such practices may be adopted simultaneously or be closely intertwined, it is rarely acknowledged that interaction research contains perspectives which view the human body and make sense of embodied interaction differently from one other (see Dicks 2014). The lack of discussion on this matter is partially due to the empirical focus of microanalysis, which has led to limited reflection about its theorical underpinnings: microanalysis has primarily been understood as a set of methods with which to study naturally occurring human interaction. In the process of separating method from theory, the role of the researcher’s theoretical perspective––which is manifested through and thus inseparable from the method––is diminished.

Importantly, microanalysis in all of its forms has clear roots in ethnomethodology, which is a field of inquiry established by Harold Garfinkel. Garfinkel wrote of “ethno-methods,” the ways in which members of a society themselves make sense of their social lives (Garfinkel 1963, 1967; Garfinkel and Sacks 1970: 16f.). In ethnomethodology, the researchers’ task is to uncover the members’ own sense-making practices in their everyday lives. However, the uniqueness of Garfinkel’s perspective, in Lynch’s (1993: 9) words, “was not that he wanted to study ordinary methods of practical reasoning but that he disavowed the privilege of an academic or administrative science” (emphasis original). In other words, the ways in which laypersons in their everyday lives make sense of their surroundings, including the people and events they encounter, are not treated as any less accurate than scientists’ perspectives on these matters.

In the practice of microanalysis, therefore, the influence of its ethnomethodological roots manifests in the prioritization of empirical material over predefined theoretical categories, meaning that it is the researcher who is tasked with the role of illuminating how the participants themselves in video-recorded interactions make sense of the social events they are engaged in. However, from this perspective, the role of the researcher in interpreting the participants’ actions through specific microanalytic practices is not often discussed. Furthermore, while there exist a few studies that have analyzed researcher interaction in data sessions (e.g., Tutt and Hindmarsh 2011), there is still little research on how practicing microanalysis involves the researchers’ bodies and their spontaneous approaches to making sense of their participants’ behavior. In this study, we explore different microanalytic practices that are used to make the participants’ perspectives observable, and we reflect upon how the various perspectives of microanalysis are manifested in local interaction between researchers.

We take as our example affect and emotion—which we use here interchangeably to refer to the range of embodied phenomena wherein emotion, affect, and feelings are intertwined with social behavior (Ruusuvuori 2013: 331f.). We investigate how a trio of professional visions––multimodal conversation analytic, co-operative, and intercorporeal––applied to microanalysis enables researchers to view a particular type of emotion, romantic affect, in video-recorded interaction. While we acknowledge that these frameworks are often applied together in microanalytic work, our goal in bringing them together is to highlight the way in which they capture the role of bodies and embodied actions in social interaction.

Traditionally, the focus of interaction analysis has been on emotional displays rather than on emotions per se, aiming to avoid making statements about whether or not participants are really sensing the emotions they project (Ruusuvuori 2013: 332). This way of seeing emotion, as an external behavior, overcomes the individualistic view of emotion and rather reinforces a dualism between inner and outer behavior. However, emotions are a form of embodied phenomena that are not always explicit in our behavior or easily pinpointed in any specific body movements. What is more, while people have been shown to pay careful attention to how they perform themselves in front of others, emotions are not always under the control of the body (Goffman 1959, 1961). Therefore, viewing emotions as explicit actions affords the ability to analytically “see” the experienced and involuntary side of emotions. The recent intercorporeal perspective (Meyer et al. 2017) on embodied interaction allows for the exploration of the experienced and embodied aspects of emotion; however, analytic observations of intercorporeal forms of sociality are harder to express in scientific terms because they are also recognized through embodied experience by the researchers.

The article is divided into two sections: (1) the theoretical-methodological roots and some current perspectives of microanalysis and (2) a microanalytical case study. The first section is divided into three parts that each elaborate an approach to microanalysis: multimodal conversation analytic, co-operative, and intercorporeal. In the second section, we introduce our own microanalytic case study, where we analyze the interaction between participants who have romantic feelings for each other. We present an analysis of two types of video data—the first is a video of an interaction between a new romantic couple and the second is a video of us, the researchers, reflecting on our observations about the data. In the first part of the analysis, we exemplify how using multimodal conversation analytic, co-operative, and intercorporeal professional visions to conduct microanalysis allows us to see affect in interaction differently, as well as how these ways of seeing can nevertheless complement one another. In the second section, we consider how microanalysis is often fundamentally manifested in spontaneously adopted and embodied ways of making sense of the participants’ social behavior, co-produced in local interactions between researchers.

Microanalytical Perspectives on Human Action

Following pioneers like Garfinkel (1967) and Goffman (1983), video-based microanalysis of interaction rests on the notion that social interaction is something that is organized through the observable actions and practices of participants, with which members of the collective make sense of each other’s actions using a wide spectrum of embodied resources (Goodwin 2018; Goodwin and Cekaite 2018). This focus on publicly available actions has clear roots in Garfinkel’s ethnomethodology, in which human behavior is viewed as ordered and accountable—observable and reportable (Garfinkel 1967: vii). This emphasis on publicly observable participant behavior rather than on “whether they’re ‘thinking’” (Sacks 1992: 118) has been designed as a response to the psychological perspective, which sees human behavior as an outer expression of inner psychological processes. However, microanalysis does not consist merely of neutral observations that uncover the participants’ practices but—we argue—also manifests in theory(ies) of human action, expressed by each researcher through their own historically learned and embodied professional vision (Goodwin 1994).

A crucial aspect of microanalysis is its focus on studying the participant perspective: how the participants themselves make sense of their own and others’ behavior through publicly available, moment-by-moment behavior. As noted by pioneering conversation analysts Schegloff and Sacks (1973: 299), an ongoing concern for interactors is the why that now—that is, the reason that the other interactors’ embodied behavior unfolded now, in that specific way and in that specific manner. Thus, the task of the analyst is merely to bring forth how the participants themselves make sense of the why that now, instead of watching the data through the analyst’s own expectations about the relevance of certain aspects of interaction (Garfinkel 2002: 171; Schegloff 1992). To accomplish this, microanalysis has focused on identifying communicative practices—microlevel methods to produce embodied and communicative actions (Schegloff 1997)—that can be turned into explicit evidence of the existence of the social phenomena in question (Schegloff 1992). The impact of the principle of focusing on explicit behavior, on the “participant perspective,” in microanalysis remains to a large extent under-studied.

As follows, we will reflect on how the multimodal, co-operative, and intercorporeal approaches to microanalysis influence the manner in which participant embodied behavior can be viewed by it.

Multimodal Conversation Analysis and the Microanalysis of Interaction

Conversation analysis (CA) was developed in the late 1970s by Harvey Sacks, Emanuel Schegloff, and Gail Jefferson to study the organization of ordinary conversations (Sacks et al. 1974). Since its early appearances, CA has taken on many forms, but the basic principles—drawing from ethnomethodology—include studying how participants understand and respond to one another in their conversational turns or, precisely, “talk-in-interaction”. The production of immediate conversational action proposes a here-and-now definition of the situation, to which subsequent talk will be oriented and bring out an interpretation, thus forming sequences of action (e.g., Heritage and Atkinson 1984).

Along with the wider “embodied turn” in the social and human sciences (Deppermann and Streeck 2018; Nevile 2015), conversation analysts have widened their interest into bodily expressions in addition to verbal action. This focus of CA has been framed as multimodal CA (e.g., Deppermann 2013; Haddington et al. 2014; Mondada 2014, 2019a, 2019b). Multimodal CA has become the prevalent perspective on video-recorded interaction. In contrast to only analyzing the organization of verbal behavior, Mondada (2019b: 64) says that multimodality

includes an interest not only in talk, gesture, and gaze, but more radically in the entire body—body posture, orientation, body-torque, and body movements. This not only concerns the individual participants, and their simple coordination, but also concerns the interactional space they visibly, dynamically, and specifically design and configure within the ongoing course of action.


According to this view, human action is organized through “multimodal gestalts” (Mondada 2014), where different multimodal resources are combined in diverse manners. Emotional displays, for instance, are products of the collaboration between body posture, facial expression, and tone of voice (Ruusuvuori 2013). The analysts’ task is to uncover the temporal coordination of these modalities in the production of action. The multimodal interactional perspective has uncovered the multiplicity of ways in which people use a variety of communicative channels to accomplish complex interactional tasks, such as interacting while bodies are in motion (Haddington et al. 2013) or conducting multiple activities at once––that is, engaged in “multiactivity” (Haddington et al. 2014). More recently, the multimodal principle has been applied to the study of “multisensoriality” (Mondada 2019a), where sensing is approached as a set of sensorial practices that are organized in relation to other multimodal resources.

To conclude, the multimodal conversation analytic professional vision views human bodies and embodied behavior as being divided into modalities, which are seen to create “gestalts” of action. This affords the analyst with the opportunity to identify the multiple modalities participants employ simultaneously and in concert with each other to produce action. However, it also produces an atomistic view of human bodies as split into modalities, understanding embodied phenomena, such as sensing or affect, as “observable” behavior—that is, not as experienced or “internal” behavior. Moreover, it implies an instrumental view of bodies and their modalities as objects “used” to produce action (see Streeck 2013: 70).

Co-operative Action and the Microanalysis of Interaction

The co-operative action theory—established by Charles Goodwin (1979, 2000, 2013, 2018) throughout his career—provides a way to see human behavior as inherently co-operative action. While it was developed concurrently with traditional conversation analytic writings and is sometimes viewed as part of the conversation analysis tradition, there are some fundamental differences between co-operative action theory and CA. Goodwin initiated this perspective already in his early work, for which he used videos to uncover how listeners co-participate in the speaker’s verbal action, thus co-orchestrating the trajectory of the speaker’s action through their moment-by-moment gazing behavior.

Goodwin’s co-operative view is in part drawn from Erving Goffman’s (1981) ideas of participation roles, as well as strongly influenced by the research of Marjorie Harness Goodwin (1980). For instance, M. H. Goodwin argues that gaze behavior and the process of “mutual monitoring” (Goffman 1963: 18) between the speaker and listener during ongoing talk enables the listener to “produce nonvocal displays of their own that provide information about their understanding of the speaker’s talk,” which “might then be consequential for the ongoing organization of the speaker’s actions” (Goodwin 1980: 303).

According to the co-operative vision, human action is seen as produced by this co-operative participation through which human beings inhabit one another’s social actions (Goodwin 2013). On the one hand, the participants of interaction are not seen to produce social actions alone; instead, interacting bodies constantly monitor and participate in one another’s actions as they unfold. This co-operative action consists of various creatively adopted semiotic resources that are used in transformation to recycle some aspects of the ongoing substrate—the set of semiotic materials being worked on (Goodwin 2000, 2013, 2018). On the other hand, inhabiting one another’s actions refers to a longer timescale of ecologies and the development of language and other communicative resources. According to this view, human action constantly accumulates with the transformative recycling of resources in intimate connection to practical actions, the usage of tools, and environmental resources.

While C. Goodwin’s co-operative approach to interaction considers action to be inherently multimodal, his framework goes beyond multimodality as it focuses on the spontaneous and creative adoption of qualitatively different types of semiotic resources—for instance, a hopscotch grid, gestures, and vocal resources (see Goodwin 2000: 1494). Goodwin calls the semiotic field a specific form of semiosis, while “contextual configuration” refers to a set of material and language resources that participants in the semiotic field demonstrably attend to in moment-by-moment action (Goodwin 2000, 2013, 2018).

However, semiotic resources in the co-operative view are seen as something used by human bodies who participate in action. According to this understanding, for instance, emotion is viewed as the display of an emotional stance (Goodwin et al. 2012; Katila and Philipsen in press). Moreover, the idea of participation sees human conduct as organized into “frameworks of social action” and, thus, the posture of bodies as being harnessed for producing social action and not, for instance, as simply resulting from the tiring or aging of the body (Streeck 2018). In other words, instead of seeing the body as material and living, with functions such as sensing pain, feeling emotion, or tiring, the co-operative action theory views the human body primarily as a producer of social action.

Both the frameworks of multimodal CA and co-operative professional vision allow for the analysis of moment-by-moment collaboration of different modalities, semiotic resources, and practices that are at play in interaction. However, these ways of seeing human conduct make it more difficult to capture the embodied, experienced, and affective aspects of behavior, let alone the physical functions of living bodies, such as pain reactions (Guo et al. 2020).

Intercorporeality and the Microanalysis of Interaction

Recently, there has been a novel tendency in the field of microanalysis to adopt an intercorporeal view of the human body and embodied behavior. This phenomenological understanding stems from the writings of Merleau-Ponty (1962, 1968, 2003), who, drawing from Husserl’s (1982) work, developed the concept of intercorporeality, which refers to primordial relationality as well as embodied understanding between living bodies. Intercorporeality denotes the idea that social meaning is grounded in a shared embodied experience (Meyer et al. 2017: xiii). Human beings, while in each other’s co-presence, are continuously sensing others and simultaneously being sensed by them through visual, tactile, and other sensorial systems (Crossley 1995; Low 2003, 2009: 209–228). This invariable mutual perception contains a basic level of communication and interaffectivity—affecting and being affected by others at the same time (Fuchs 2017).

Early applications of Merleau-Ponty's (1962, 1968, 2003) ideas to microanalysis can be seen in Cekaite (2010), in her initial paper on touch in interaction, and in Streeck (2013), in his studies on gesture. They have since been adopted by microanalytic studies focusing on touch between adults and children (Katila 2018a, b; Goodwin and Cekaite 2018) as well as gestures and body posture (Cuffari and Streeck 2017; Katila and Philipsen in press; Streeck 2018). However, it was not fully introduced as a perspective for interaction analysis until the edited volume on the intercorporeal forms of sociality (Meyer et al. 2017). Studies in Meyer and colleagues’ volume incorporate Merleau-Ponty’s concept of an intercorporeal understanding of bodies in the microanalysis of video-recorded interaction: the idea that bodies are, from the very beginning, lived and experienced materials of the world and—as a result of this inherent interconnectedness with the world through materiality—fundamentally social beings. A phenomenological perspective to microanalysis, thus, means that the research participants’ bodies and their embodied interaction is understood as being elicited through the embodied experience of living bodies and their openness to the world through perception, which is already fundamentally regarded as being relational and intercorporeal (Crossley 1995: 57).

Apart from a few expectations (e.g., Goodwin and Cekaite 2018; Katila and Philipsen in press), the intercorporeal approach to video analysis of affect and emotion in interaction is still a rather unexplored field. Intercorporeal understanding changes the paradigm of emotions surrounding both inner sensations and outer performances into something experienced by living bodies in togetherness. According to the intercorporeal perspective, emotions are neither mere “displays” nor objects that a body “produces,” “adopts,” or “uses”. To borrow Merleau-Ponty's (1962: 184) understanding of experiencing emotion in others, “I do not see anger or a threatening attitude as a psychic fact hidden behind the gesture, [sic] I read anger in it”. Although we can solitarily experience emotions, given our primordial interconnectedness, the individual sensations of emotions cannot be separated from their roles as signals to others. Consequently, from an intercorporeal perspective, bodies are not distinguished as inner and outer behavior; instead, they represent socially shared meanings that are thought to originate from shared embodied experience.

In regards to interpreting meaning in interaction, from an intercorporeal perspective it can be understood that researchers of embodied interaction—through their embodied and empathetic experience when watching their video data—are able to glean some of the experienced aspects of the research participants’ emotions and affect. However, working within the intercorporeal framework does not mean that the others would be transparent to us or that either other co-present in the moment or researchers would directly recognize the real meaning of participant behaviors (Andrén 2017). As much as our bodies are the same, we are unique—with unique bodies, histories and experiences that mold how and who we are; a research participant’s body feels different than that of a video-analyst (see Behnke 1997). However, while researchers can never entirely capture the exact lived experience or the “unique adequacy” (de Montigny 2017: 352; Garfinkel 2002: 175) of the participants’ realities, ethnographical knowledge of the background and the research participants can be an essential resource for making sense of their social encounters. Researchers are able to adopt this background knowledge to analyze phenomena in the video data by considering the research participants’ history and the context of the lifeworld being studied.

While Merleau-Ponty’s intercorporeal understanding of bodies is not directly connected to ethnomethodology, the microanalysis of video-recorded interaction in general has clear roots in ethnomethodology—which again derives from phenomenology. For both Merleau-Ponty and Garfinkel, the notion of embodiment is central, and both draw much of their thinking from Husserl’s phenomenology (Heritage 1984; Lynch 1993: 117–158). For instance, Garfinkel used an experiment with inverted lenses to compel his students to discover the importance of embodied presence in practical actions. By undermining the process of achieving the “self-evident” details of bodily actions, the lenses presented the possibility of becoming strange again with the practices of what Garfinkel calls the “endogenous embodiment of practical actions”. Thus, for Garfinkel, the basis of sharing meanings for human beings is the fact that they are embodied beings who are “embodiedly” engaged in ordinary actions (Garfinkel 2002: 209f.).

At the first sight, an intercorporeal view of bodies is in fundamental conflict with the instrumental and atomistic view, implicated in multimodal CA and co-operative perspectives. However, while our bodies’ forms of sociality—even verbal actions and “usage” of conventionalized language—can be seen as experienced and intercorporeal, this does not mean that they cannot also be perceived as co-operative and multimodal. Despite the differences in how they consider the human body and action in analysis, we argue that it is fruitful to combine multimodal CA, co-operative, and intercorporeal perspectives in microanalysis. The intercorporeal perspective includes the researcher’s empathetic body and embodied experience in the data analysis and thus allows for the recognition of embodied phenomena. However, there are challenges in proving their existence without allocating attention to the co-operative and multimodal CA perspectives on the various modalities that are at play: the co-operative perspective and multimodal CA provide the analytical resources necessary for “giving evidence” about the existence of a specific participant’s sense-making process. All of these perspectives manifest intentional, motivated and professional actions: learned and habitualized ways of seeing from the perspective of certain profession (Goodwin 1994)—an interaction researcher. Subsequently, while the intercorporeal perspective is based on a professional vision that involves the researcher’s embodied and empathetic experience, it still represents a form of goal-directed analytical orientation toward video data and should not be confused with the “natural attitude” (see Garfinkel 1963: 210–217; Schütz 1962: 207–259) of the research participants.

In the next section, we continue this discussion by presenting an empirical example of our own microanalytic study. We exemplify the intercorporeal and multimodal/co-operative professional visions applied to microanalysis and discuss how these could be adopted together.

The Microanalytic Process of Identifying and Analyzing Romantic Affect

To collect authentic samples of everyday interactions, Researcher 1 (R1) regularly video-recorded interactions in her home and the places she visited between people she would have interacted with whether or not she was recording them. These video recordings are, hence, unmotivated in the sense that they were not recorded to study any particular interactional phenomena.Footnote 1 R1 herself was involved as a participant in the recordings.

The current study began as R1 and Researcher 2 (R2) watched R1′s video recordings, aiming to learn what types of data could be used in a study. The phenomena of interest were identified when, while observing the episodes, R2 expressed her feeling that, in one of the excerpts, the two participants seemed to have recently fallen in love. When she pointed out this observation to R1, R1 told R2 that the participants in the video had indeed recently met and started dating. Prior to this discussion with R2, R1 had not paid any specific attention to the episode even though she was herself another participant in the interaction. The conversational content of the excerpt was mundane and occurred while the two participants were working at computers side-by-side, both parties doing their own work. However, after seeing the interaction again with R2, R1 also noticed the affective atmosphere in the encounter to which R2 was referring.

In what follows, we analyze this episode (Extract 1). We illustrate the case using verbal conversation transcriptions and still images from the video. To protect the identities of the participants, still images were reproduced using line drawings. The verbal transcription conventions, which were modified for our purpose from the work of conversation analyst Gail Jefferson (2004), are presented in Appendix. In the textual transcriptions, the lower part shows the original conversation in Finnish (in italics), while the English translation is presented above.

Extract 1 presents an interaction episode between a white heterosexual couple, whom we call here Anna (A) and Oliver (O). When the episode of interest begins, the participants have just started to work together at home. They are sitting next to each other at the kitchen table in Anna’s home, with their laptops on the table (Fig. 1). However, when the episode starts, they engage in shared attention aimed at Anna’s laptop. It is the Christmas season, and there are decorative candles on the table. The participants are quiet for several minutes when Anna breaks the silence by complaining about her computer being slow. Oliver immediately provides a solution to Anna’s practical problem—to install the “Dropbox” desktop application onto her computer, as Anna’s initial problem was that it took a long time for her to open a file from the Dropbox internet platform.


Extract 1. The Initial Episode.

figure a
figure b

The general storyline of Extract 1 includes Anna first complaint about her computer being slow, to which Oliver responds with advice about how to make the computer faster, and, finally, Anna declining his advice.

Intercorporeal Lenses Applied to Microanalysis

From the intercorporeal perspective, a researcher can to a certain extent co-empathize when the participants share a strong affective connection or resonance (Fuchs 2017). Participants’ bodies are not merely presenting but also living the romantic affect (Merleau-Ponty1962). Accordingly, it can be sensed that, for the participants, a great deal of the interaction is about sharing a moment with one another. This peculiar affective and emotional intercorporeal experience is hard to put into words—it is a kind of cute awkwardness that resonates across the bodies of two people who have a crush on each other. Even when they are not physically touching, the bodies seem to be touching from a distance (see Fulkerson 2012)—intertwined with one another’s corporeality and attuning to one another’s tone of voice, which vibrates affect throughout their bodies. Furthermore, they are intensely engaging in reciprocal gazing, reflecting and mirroring each other’s expressions like a corporeal couple-dance from a distance, composed of micromovements intertwined with one other.

Thus, instead of starting by mapping the semiotic resources and modalities that would produce affect, the intercorporeal perspective starts from a more holistic view of how the participants are with one another and react to one another—that is, the affective vibration and the intercorporeal attunement that they share. This engages the researcher’s embodied experience when looking at the data. The intercorporeally elicited affective meaning is recognizable by researchers—not only because of our own encultured and historical bodies and knowledge of the research participants’ embodied histories (Scollon and Scollon 2004; Sack 1992: 226), but also through a type of professional vision that focuses on the embodied and experienced side of affect: its intercorporeal aspects.

The intercorporeal approach thus allows a researcher to “see” affect through his or her body’s eyes without immediate pressure to locate emotion in the co-work of various modalities and semiotic resources. It is crucial that the bodies of participants are seen as living the studied emotion and affect, and that this living is simultaneously expressive. Thus, they are not seen as simply adapting their bodies to display certain emotions, and that emotion is not seen as a result of embodied conduct. While the bodies’ ways of perceiving and acting in the world are embedded with cultural meanings and language (Crossley 1995, 2003), emotions are fundamentally treated as experienced. However, from the intercorporeal perspective, it is hard for researchers to provide “evidence” of the existence of affect, as their bodies are the research instruments that recognize the action. Moreover, intercorporeal perspective by itself does not directly afford resources to analytically pinpoint the structured and ordered aspects of communicative practices and signs.

Multimodal CA and Co-operative Lenses Applied to Microanalysis

Another form of microanalytic professional vision is to see affective phenomena in terms of the collaborative work among different modalities. Not only as bodies skilled with particular cognitive abilities but also as interaction researchers who have developed a specific type of professional vision, we are able to describe how the action unfolds between participants moment-by-moment through the division of labor between different modalities, such as gaze direction, facial expressions, body movements, and words, that contribute to the type of social engagement that is occurring—namely, romantic affect.

Already when the extract begins, it is possible to see that Anna produces a “complex multimodal gestalt” (Mondada 2014: 98) of emotional display (Ruusuvuori 2013: 331), consisting of a slightly frowning face—squinting her eyes while frowning her forehead and lifting it upwards, her mouth square shaped with her teeth showing slightly—a slight distancing of her body from the laptop, and not using her hands, demonstrating her inactivity and “doing waiting” (Fig. 1). She directs her gaze and body toward the laptop, using this action as a resource to create an environmentally coupled (Goodwin 2007) relationship with her complaint and the computer, thereby making it immediately evident that her complaint has to do with something that she sees or experiences on the device––here, the slowness of the online version of the Dropbox program.

However, immediately after Anna produces her complaint, she laminates her utterance with a laughing expression (line 02), retrospectively lightening her complaint and mitigating its seriousness (Ruusuvuori and Peräkylä 2009), which also contributes to the subsequent transition in the emotional atmosphere. Thereafter, it is possible to view the unfolding romantic affect as a display of a number of embodied resources that have been associated with positive affective attunement, such as mutual gaze, smiling, co-laughter, a softer pitch of voice, and mirroring each other’s gestures or body postures (Goodwin and Cekaite 2018; Jefferson et al. 1987; Speer 2017).

While Oliver is providing instructions about Dropbox (line 04–05), Anna produces another multimodal gestalt of emotional display, gazing directly and intensely into Oliver’s eyes, opening her mouth slightly into a round shape, and blinking noticeably with her eyelashes a couple of times (Fig. 2). Oliver then turns his gaze toward Anna, and their eyes meet for a moment before Anna withdraws from their mutual gaze and attends to the apparently relevant item—her laptop (Figs. 3, 4). Even when Oliver points at Anna’s laptop (Fig. 4) and continues to talk about Dropbox (line 05), and even when Anna does look at the laptop, she positions her body away from the laptop and, therefore, does not show an active interest in it (Fig. 4).

Here, we can witness a transition in the emotional atmosphere from “work-orientation” and talking about Dropbox into a co-operatively accomplished romantic action. While Oliver is orienting himself more toward his instruction on the usage of Dropbox, Anna participates in his action with emotional displays that laminate the apparently task-oriented interaction with acts of romantic affect. Anna uses several semiotic resources through which she makes her affective intent publicly available—for instance, she receives Oliver’s instructions with a verbal response (“yeah,” line 06), which is accompanied with a soft but high pitched tone of voice, a square-shaped mouth, a slightly turned head, and half-opened eyes that display interest toward Oliver and not the laptop. Oliver constantly monitors (Goffman 1963: 18) the subtle interaction cues and adopts them in his own actions by speaking (“and then it is easier to do,” line 07, Fig. 5) using a lowered and, by the end of the sentence, almost inaudible voice. This moment-by-moment transition in the tone of voice shows Oliver’s step-by-step dropping of his interaction project (Levinson 2013: 126) of giving advice while attuning to Anna’s embodied gestures. Moreover, by the end of his speaking turn (Fig. 6), Oliver ends up shifting his gaze toward Anna and producing a multimodal gestalt that involves pouty lips and raised eye-brows just as Anna turns her gaze toward Oliver in response. These emotional displays merge into a co-produced gestalt of affective moment, where Anna inhabits (Goodwin 2013) Oliver’s gestalt by adopting a coy smile and verbal agreement (line 08, Fig. 6).

During this brief moment of gazing and smiling (Fig. 6), the two co-operatively create an emotional atmosphere rich with mutual engagement. Subsequently, and with the same facial expression, Anna responds to Oliver’s advice: “I don’t feel like doing it” (line 09, Figs. 7, 8, 9). Anna produces the words in a highly marked manner: by lengthening the word “don’t” while moving her head backward and taking a considerably long pause before continuing by saying “feel like doing it”. Moreover, as the content of her response does not provide a proper reason for not taking Oliver’s advice and is said with a particular tone of voice, along with a gentle or coy smile, her response can be read as embedded within an affective interaction project whose explicit meaning is left designedly ambiguous (see Speer 2017: 129). The episode ends with Oliver smiling and providing a minimal response, “yhmm,” along with a slight nod, which leads to co-laughter (Fig. 10).

Seeing the episode from the perspective of the co-operative interplay of semiotic resources (Goodwin 2000, 2013, 2018) and the production of complex multimodal gestalts of emotional display (Mondada 2014; Ruusuvuori 2013) allows us to pinpoint the communicative techniques (Mauss 1973) of the body that enable communicating affect to another person and creating affective bonds. However, the multimodal conversation analytic and co-operative frameworks for microanalysis construct bodies as users of emotional displays and semiotic resources, making it more difficult to analyze the experiential aspect of bodies and embodied, or affective, forms of sociality.

Analyzing our data from the intercorporeal and multimodal/co-operative action perspectives shows that, despite their fundamental differences in how they see the body, in the practice of analysis they can be used complementarily. The intercorporeal approach allows us to identify participant body behavior as romantic affect—an intercorporeal phenomenon that we recognize through the affective resonance, mirroring, and empathetic tendencies of the researcher’s human body (Trevarthen and Aitken 2001). The multimodal and semiotic resource perspective acts to split the observation into parts and bundles of practices that, again, are useful in explicating the results in written format and accompanying them with screenshots of video and transcriptions of written talk, a process that represents the microanalytic practice of “providing evidence” for a reader of a scientific publication. Each one of these perspectives—which are often adopted together—are not just ways to “bring forth” the participants’ publicly available methods of making sense of the situation they are in. Instead, they are, importantly, making the participants’ actions and bodies seen in a specific way—either as experienced and living, intercorporeal organisms, or as tools for communication. Moreover, it could be stated that a crucial part of analysis for a researcher is not just reporting what is observed, but also participating in something that could be called “doing” providing evidence (see Sacks 1984) that the reported phenomena “really” happened in the interaction according to those particular terms, thus offering justification and accounting for the accuracy of the specific method in order to maintain the status of a “scientific” analysis.

Moreover, in the practice of microanalysis, the analytical perspectives elaborated above—multimodal CA, co-operative action, and the intercorporeal perspective—hardly occur in a “textbook” form in the actuality of practicing science. Instead, they have become part of the researchers’ embodied way of seeing the world through their learning of a professional skill, and as such they are not necessarily always conscious of it. Moreover, in the moment of analysis, researchers are not only researchers: they are embodied and living beings with history and life experiences of their own, attending to moment-by-moment interaction and social relationships with their collaborator(s), and as such they never entirely dedicate their bodies to analytical action. Thus, microanalytic professional visions are always occurring in the interactional and historical moment, tailored to the context and reproduced in local, moment-by-moment interaction between researchers. In the next section, we reflect upon how the researchers’ microanalytic interpretation unfolds as an interactive and embodied process, where researchers collaboratively make sense of and produce the participants’ embodied action. We elaborate this reflection by analyzing a video-recorded episode of our own research process of analyzing and discussing Extract 1.

Interaction Analysis as an Embodied and Interactive Process

In this section, we exemplify, with a brief glimpse at our own research process, that microanalysis is not merely the result of single bodies adopting specific professional vision(s). Instead, it is often a temporally layered process, accomplished through researchers’ embodied interaction and other institutionalized and spontaneous “ethno-methods” of the researchers’ bodies. Such resources include: (a) the researcher’s body, which can co-empathize with the research participants’ affect and “cite” this co-empathized experience in another context to make analytical observations available to other researchers, (b) knowledge of the context and ethnographical background of the research participants, their relationship history, and the specificities of the context, and (c) having the professional skills of a microanalyst and a solid knowledge base concerning interaction analysis.

While we do not have the space to describe the entire process in detail, we show the embodied, interactive, and context-specific methods of our microanalytic sense-making process in Extract 2, which is a video-recorded episode of a meeting between R1 and R2 about the video data introduced in Extract 1. The extract describes the process through which we, as researchers, try to capture and talk about the embodied phenomena presented in Extract 1. In Extract 2, R1 and R2 are engaged in co-operative action, where they collaboratively attend to a computer—R2 is discussing her ideas, and R1 is taking notes on the computer about their discussion. The clip begins with R2 describing a scientific conceptualization of “in order to” that draws from Schütz’s (1964: 32) ideas. R1 has written about this concept in a developing article draft that R2 had read just before the meeting describing the implicit motives of the body’s movements and affective expressions.

Extract 2. The Researchers Talk About the Initial Episode.

figure c
figure d

In lines 01–05 (Fig. 1), R2 starts by commenting positively on the usability of the concept “in order to”. As she only receives a minimal response from R1 (“mmm” on line 06), R2 continues in lines 07–12 to further describe the concept (“in order to”) and the reason why she thinks it is usable for the research project. The concept implies more than just verbal content, but all the embodied resources that are produced “in order to” do something.

Interestingly, to exemplify the embodied content that can be captured with “in order to,” in lines 14–17, R2 re-enacts—that is, she provides an embodied demonstration of past events or scenes (Sidnell 2006; Tutt and Hindmarsh 2011)—the romantic affect that she recognized from the data in Extract 1. Accordingly, on line 17 (Fig. 2), R2 utters the words “something very mundane” and, by saying these words, she re-evokes the affective meanings experienced by the participants in Extract 1 with a lowered, gentle tone of voice that resembles the tone adopted by Anna in Extract 1 (“I don’t feel like doing it”).

In other words, R2′s embodied re-enactment not only reactivates but also provides an intercorporeal interpretation of the affect in Extract 1 (Tutt and Hindmarsh 2011), which brings it alive and encourages it to be seen in a specific manner. Moreover, when saying the words “something very mundane,” R2 “body quotes” (Keevallik 2010: 401) the style of moving and being exhibited by the participants in Extract 1—she moves her body closer to R1, with a slight wave-like head motion, and then she intensifies her gaze toward R1 with a smiling face. The emphasized, even parodying manner in which R2 re-enacts the affective phenomena in Extract 1 shows that she is not targeting these actions directly—here and now—to R1 but is recycling, with a transformation, certain affective aspects from Extract 1 in a new, but now scientific, substrate (Goodwin 2013). Thus, R2 is able to “cite” the affect she interpreted from the data and bring it into a new moment through the practice of keying—transforming something into a different context in a way that enables it to be seen as something else (Goffman 1974/1986: 43f.). This body quoting is a general way that the body makes sense of past or potential future events; however, here R2 spontaneously and locally adopts body quoting of the affect from Extract 1 as a professional practice used to explain her understanding of the concept “in order to,” and, at the same time, produce a microanalytic interpretation of the affect evident in Extract 1.

Subsequently, in lines 18 and 19, R1 turns her gaze toward R2 and starts to agree, but is cut off by R2, who, from line 20 on, continues to further elaborate on how the embodied expression of “in order to” is available to and observable by the viewer. R2 has difficulty finding the right words to express her point. In lines 20–30, she corrects herself multiple times and there are pauses, re-starts, word repetitions, and other search-showing practices (Kitzinger 2012), which, interestingly, speaks to the nature of the phenomena she is describing (i.e., that it is hard to put into words). However, R2 expresses her idea by gesturing toward herself (Fig. 3), through which she embeds herself as the viewer who is observing the embodied meanings with and in her body. In other words, R2 is thinking about and expressing her ideas with her hands, grasping the still pre-discursive meaning as if manually making it into something tangible and shared (Cuffari and Streeck 2017). Interestingly, this describes the process of transforming embodied, tacit meanings into language and demonstrates how, in this manner, gestures can become a major part of collaborative idea building in the scientific process (Tutt and Hindmarsh 2011). Reducing perceptions into scientific language is always an embodied interpretation and can, perhaps, never be done in a way that would precisely capture the embodied meaning that was experienced.

Next, R1 steps in with a joyful display of realization (“YEAHH” and “RIGHT” on lines 31–32), which expresses that she has finally understood what R2 means. In lines 34–35 and 37, R1 says, “when you said it with that tone of the voice, I immediately got what you meant”. At the same time, R1 reciprocates with R2′s self-targeted grasping gestures (Fig. 7). By recycling R2′s gesture from Fig. 3, R1 displays that something similar is happening to her as was happening to R2 in Fig. 3 (embodied realization). Moreover, R1 elaborates how she was able to understand the embodied meaning implied in R2′s enactment of Extract 1—through R2′s tone of voice. Thus, R2′s enactment of the embodiment of Extract 1 enabled the researchers’ shared understanding of a scientific concept as well as the scientific phenomena found in the data. After R1 makes her realization explicit, the encounter between R1 and R2 unfolds with a joyful moment of shared affective flooding out (Goffman 1961; Katila and Philipsen in press). The researchers laugh together and start a collaborative “collapsing,” where R1 hides her face and R2 bends forward and gazes down.

figure e

Interestingly, the intimacy and co-laughter at the end of Extract 2 resemble the intimate moment of co-laughter between the participants of Extract 1. The environments also have similarities—both are two-party interactions wherein the participants share an embodied attention with both a screen and each other. Moreover, due to R2′s body quoting of the emotional gestalt from Extract 1, the existence of a similar type of emotional atmosphere has been evoked in Extract 2. However, while citing the gestures and bodies from Extract 1 evokes and makes salient in the moment of action the original (romantic) affect from Extract 1, it is not personally felt by R1 and R2 as their “own,” anchored in this very place and time. Instead, while it is empathetically recognized and experienced as such, it is also reproduced as complex and layered intercorporeal sense-making for professional purposes. This indicates the spontaneity and complexity of the embodied and empathetic abilities of the human body; co-feeling and making emotions available in new forms outside of the moment, “using” them for different purposes—for instance, for the professional interpretation of a microanalyst. In the moment of social interaction between the researchers, multiple temporalities and nested contexts—including the individual history of both researchers as members of a certain culture, their historically created professional visions and their shared relationship history as colleagues—emerge. This allows a setting where the primordial empathetic experience of bodies (of a romantic affect in the data) is interactionally transformed into a resource for both professional purposes and—intertwined with the same action—making fun of the usage of the initial affect in a professional context.

Accordingly, because of their spontaneity, the gestures and body postures in Extract 2 are not solely performed for strictly professional purposes—they are also attending to the here-and-now social moment and the relationship between the two researchers. For instance, the laughter at the end of the episode has multiple layers. It is not simply following the same pattern as Extract 1, but it rather exemplifies a very context-specific scientific humor (Mulkay and Gilbert 1982) that considers a sort of metacommentary on the fact that, here, R2 is directing the same action to R1 as Anna had directed to Oliver. Thus, the researchers are “parodying” the embodied scenario in a completely different context and social relationship. This is telling about how microanalysis is entirely embodied labor, manifested in local interactions between researchers where the historically produced professional vision emerges in contextually tailored and intermeshed ways.

Conclusions

In this study we have uncovered how various approaches to microanalysis—multimodal conversation analytic, co-operative, and intercorporeal perspective—manifest different theoretical premises and, thus, professional visions of human bodies and action. In our analysis, we took romantic affect as our example. In the first part, we reflected on how the multimodal CA and co-operative perspectives of interaction analysis produce affect as displayed in observable action, while the intercorporeal perspective emphasizes the embodied and experienced side of affect. We concluded that, at their best, these approaches can be used as complementary ways of perceiving the human body and affective behavior. While the intercorporeal framework allows for the recognition of the experienced and sensorial side of affect in our own bodies, the co-operative and multimodal perspectives allow us to situate affect into a specific moment and identifiable body parts, forming multimodal gestalts. Thus, these different professional visions can complement one another in the scientific process. They can also allow us to capture the broader aspects of emotion and affect and, therefore, develop a more comprehensive understanding of the forms of human sociality. While our observations are based on a single study, we hope for future discussions about the differences between various professional visions in microanalysis in order to develop microanalytical methods that even better capture and understand especially the embodied and affective forms of sociality.

In the second part of our analysis, we showed that interaction researchers, as participants in scientific interaction, are able to both live and experience––as well as use and perform––their bodies in the same ways as the research participants they study in order to make analytical interpretations. This conclusion was enabled by the embodied interaction process in which, through meaningful transformations, we recycled emotional gestures from an original context for different purposes (Goffman 1974/1986: 43f.; Goodwin 2018). Researchers are able to “use” their own affective and experienced bodies for professional purposes and utilize this embodied interpretation in association with various microanalytic professional visions. Importantly, our research shows that microanalysis is a fully embodied and interactive process that engages the abilities of the researchers’ bodies in various forms of professional vision: the empathetic ability to recognize affect and emotion as well as the ability to see human behaviors as actions divided into various modalities and semiotic resources (Goodwin 1994; Goodwin and Goodwin 1996).

We started from the notion that it is important to recognize the differences between various perspectives, as it has direct consequences on how the phenomena of interest can be seen by the reader. While all of these microanalytic perspectives have their roots in ethnomethodology and, thus, in phenomenology, they view the notion of embodiment differently. According to multimodal CA and co-operative action, the embodiment is considered a result of the combination of different modalities and resources adopted by the participants of social interaction. In a way, the intercorporeal perspective can be seen as bringing back some of the holistic views of embodiment present in Garfinkel’s thinking, which has to some extent disappeared along the way in the history of microanalytic research. Crucially, based on our study we argue that in order to capture the complexity of especially the embodied, affective, and multisensorial forms of human sociality—the experienced and expressive, voluntary and involuntary, structured and spontaneous, lived, felt and performed, among others—it is most fruitful to adopt multimodal CA, the co-operative view, and the intercorporeal perspective of video analysis together.

In our study, we have deconstructed how these theories manifest in the act of microanalysis—what sorts of observations they allow or afford—and argue that, as a scientific method, a crucial part of microanalytic practice is to produce itself as “scientific” through practices of “giving evidence” about the existence of the phenomenon. By taking a critical approach to the idea that these methods are theory free and neutrally “reveal” or “report” the participants’ perspective, we have uncovered some of the theoretical underpinnings of microanalysis, and the role of researchers’ local interactions in the process of doing microanalytic interpretation.