Accounting for changes in series of vocalisations e Professional vision in a gym-training session

This paper presents a study of vocalisations, i.e., non-lexical sounds, in video-ﬁ lmed sessions of gym training where one personal trainer (PT) and three clients are working out together. The object of study are series of vocalisations performed in connection with series of physical exercises, and the participants' orientation to change in such series is explained using the notion of professional vision (Goodwin 1994, 2013, 2018). We use sequential analysis of the multi-modal interaction, focusing on the PT's interactional work to make changes in series of vocalisations accountable. Our results show how vocalisations are recycled by the PT, transformed in new interactional contexts and thereby rebuilt into new social actions such as correcting, criticising or instructing. The under-speci ﬁ ed nature of vocalisations (Keevallik and Ogden 2020) gives the PT an opportunity to reuse them as objects of knowledge for the members of the group, sharing his professional vision in co-operative actions (Goodwin 2013). The study potentially contributes not only to research into vocalisations as one of humans' communicative resources for inter-subjective understanding, but also to the analysis of professional practices used for providing physical health-care. © 2023 The Author(s). Published by Elsevier B.V. This is an open access article under the CC


Introduction
Vocalisations, i.e., non-lexical sounds, in interaction can be oriented to as meaningful by participants in an ongoing activity (see, e.g., Goffman, 1978;Couper-Kuhlen, 2014;Keevallik, 2014). In this paper, we present a study of serial vocalisations in data from personal-training sessions produced in connection with exercises involving serial repetition of a physical movement. Specifically, we show how change in a series of vocalisations is made accountable, and we analyse the interactional work performed by participants in relation to change as an accountable act.
Personal training is an activity where an expert and a client jointly engage in the client's body with the goal of improving the client's physical performance and well-being (Georg, 2008). This is an institutional context where the role of the professional participant is fundamentally interactive in nature, and the study shows how vocalisations are meaningful in relation to this specific activity. The data are drawn from a corpus of video-taped interactions during Swedish-language physical institutional activities. We conduct a multimodal interaction analysis (Broth and Keevallik, 2020;Goodwin, 2018) to show how changes in series of vocalisations are made accountable, and how this is relevant for the joint learning in a group.
Explorations of vocalisations in interaction provide significant contributions to a better understanding of the interplay of multi-modal resources in human interaction (Dingemanse, 2020;Keevallik and Ogden, 2020). We argue that Goodwin's theory of co-operative action and his notion of professional vision (1994; 2013; 2018) can help us understand how different communicative resources are used and co-ordinated in the specific context of gym training. It should be noted that the professional vision is not associated only with the construction of a professional role in an institutional activity but embraces all members of a group, bringing them together into a common apprenticeship. In Goodwin's words, the members come to "inhabit each other 's actions" (2013: 8). This approach can help to explain the participants' orientation to vocalising behaviour as seen in our data, since: [a]pprenticeship through co-operative calibration provides resources for organizing as social practice not only the actions being built by the participants, but also skilled actors who can be trusted to see, categorize and operate upon the world in the ways required to carry out the actions that define the work of their communities. (Goodwin 2013:19) Our aim is to showpara how the series of vocalisations performed by clients are made accountable and to identify the consequences of this both in terms of local sequential organisation and in relation to the goals of the specific activity.

Background
In our study, a vocalisation is defined (rather narrowly) as a non-lexical sound and produced in conjunction with a serial bodily exercise where several repetitions are performed of a single physical movement. What we call a series of vocalisations is a number of repetitions of one or several such sounds reflecting the serial nature of the exercises. The exercises in question involve performing the same movement 10e12 times ("repetitions" or "reps") without interruption. Previous studies of repetitive sounds have mostly focused on lexical content and on the pragmatic functions of repetitiveness as such. One example is Stivers (2004), who concluded that multiple sayings (e.g., "no, no, no") are systematically organised in terms of both sequential position and function. Another is Simone and Galatolo (2021), who found systematicity in lexical repetitions (e.g., "up, up, up") with regard to formatting, timing and delivery in instructions about the need to continue an embodied action going. These studies both concerned lexical repetitions, whereas Harjunp€ a€ a (2022) instead explored interactions between humans and pets, referring to the sounds made by the pets as vocalisations and describing how these were responded to by humans in vocal modality; the vocalisations were recycled and reduplicated close in articulation to the original pet sounds. Harjunp€ a€ a (2022) illustrates a very specific case where pets' vocalisations are used in interaction and where a human part to some extent adds semantic content.
As regards the lexicalness of the vocalisations in our data, we take it that none of them is a "word" in the sense that it can be "considered to be a sound having (1) a clear meaning, (2) the ability to participate in syntactic constructions, and (3) a phonotactically normal pronunciation" (Ward, 2006: 134). Regarding (2), while we do have examples of vocalisations fulfilling a syntactic role, they never do so in a way that can be seen as conventionalised or that could be expected to be repeated for the particular syntax and sound in question. Hence our vocalisations could be referred to as "non-lexical", but Dingemanse (2020) in our opinion has a good point when, to underline the potential interactional status of vocalisations (in a broader sense, without our link to serial physical exercises), he avoids such negatively defined terms and instead uses the term liminal signs. In fact, Dingemanse wants to focus on "the importance of liminality, ambiguity, and deniability in social interaction" (Dingemanse, 2020:191). This is well in line with the observation that the under-specification of vocalisations is one of the things that make them useable resources for interactants (Keevallik and Ogden, 2020;Hofstetter, 2020).
Because vocalisations are characterised by complexity of both form and function, they are best studied in empirical investigations of authentic interactions. One of the first studies of vocalisations in relation to social action is that of Goffman (1978), who stated that what he called "response cries" are expressions of inner states or feelings, meaning that they are not interactionally accountable and are not necessarily intended as contributions to talk. Since then, however, studies of vocalisations have shown that they are e or can be e oriented to as accountable and that they have both sequenceorganisational and social functions in interaction. Such studies have been performed with a focus on verbal interaction and on vocalisations as response tokens, and they have addressed the affective functions of vocalisations, such as making assessments, indicating a change of state, displaying affiliation/disaffiliation or showing disappointment (see, e.g., Schegloff, 1981;Heritage, 1984;Sorjonen, 2002;Couper-Kuhlen, 2009). Further, one approach taken to vocalisations in interaction has been to study one specific type, such as whistling, sighing or clicks (Reber, 2012;Hoey, 2014;Ogden, 2018), in order to establish what actions they do in authentic interaction and how they are associated with local and sequential organisation. Empirical studies show how different kinds of body sounds may contribute to organising interaction; for instance, depending on sequential position a sniff can delay turn progression or indicate that a turn is completed (Hoey, 2020), and a conventionalised audible outbreathing (the Finnish response cry "huh huh") can be used by participants in assessing actions and showing solidarity (Pehkonen, 2020).
The form of vocalisations tends to be complex and thus difficult to map to a single function. Rather, the functionality of vocalisations tends to be locally and sequentially organised. For example, Helmer et al. (2020) concluded from a literature review that vocalisations of pain in medical settings have to be assessed together with other pain indicators, since the association between vocalisations and pain is complex. Indeed, empirical studies of pain displays suggest that they are socially organised (Heath, 1989) and negotiated  with respect to sequential considerations and as part of multimodal configurations (La, 2018;La and Weatherall 2020;Weatherall et al., 2021). This means that vocalised pain is not merely a signal of a bodily experience.
Other settings where vocalisations have been specifically studied in relation to embodied action, with a focus on social orderliness in different contexts, are dance classes (Broth and Keevallik, 2014;Keevallik, 2014Keevallik, , 2015Albert, 2015;Albert & vom Lehn, 2023) and the mucking-out of a sheep stable (Keevallik, 2018). In this line of research, vocalisations are often found to do double work. For example, in the case of mucking-out, they may represent a physical sound due to effort while at the same time signalling to a co-participant to bring a wheelbarrow to a halt (Keevallik, 2018). Wiggins has described "gustatory mmm" as an interactional construction of pleasure whit focus on its embodied features (2002; see also  and "eugh" as a signal of disgust that is also oriented to as such by co-participants (Wiggins, 2013). Further, Hofstetter (2020) has shown how "moans" in gameplay can be treated as laughables and hence as signals of playfulness and of an opportunity to show feelings such as disappointment in a socially acceptable way. In a study of experiences from wearing an exoskeleton, Katila and Turja (2021: 22) described bodies as "fundamentally expressive, sensing and sentient beings", noting that bodily expressions are inter-subjectively accessible in interaction. Mondada (2019: 59) underlines the concept of multi-sensoriality, in which embodied sensoriality is understood as "interactionally, intersubjectively, sequentially organized".
The activity explored in the present study, that of personal training, is first and foremost a physical activity where moving bodies are in focus. It is a goal-oriented activity that aims to enhance the client's physical strength and well-being (Georg, 2008;Evans and Reynolds, 2016;Huhtam€ aki et al., 2022). To explain how the serial vocalisations in our data contribute to the goal of the activity, we use Goodwin's (1994) notion of professional vision. According to Goodwin, a professional vision can be shared in a group when a well-informed group member highlights a phenomenon as a knowledge object. Such a highlighting opens up a "socially organised way of seeing and understanding events that are answerable to the distinctive interests of a particular social group" (Goodwin 1994: 606). In our analysis, we explain how changes in series of vocalisations are made accountable by the personal trainer and how the interactional work performed by him can be identified as recycling, transforming and building new social actions (Goodwin 2013). These are examples of what Goodwin calls co-operative actions, and they basically characterise all human interactive behaviour. In this paper, we illustrate how such behaviour can manifest itself at the micro-level in a local context.

Data, methods and ethics
Our study draws on a corpus collected in institutional settings with not only personal trainers and their clients but also massage therapists and their clients. A total of 17 sessions were recorded in Sweden and Finland. The data consist of 16 h of video-recorded interaction; the language spoken is in all cases Swedish. The number of participants is 27, of which 9 professionals and 18 clients. The main activity is physical exercise, even though verbal interaction is also present. All the vocalisations are related to the ongoing embodied activity, and they most commonly constitute sounds that may be due to physical effort, such as loud breathing.
In the selection of data for this study we began by identifying all vocalisations. This yielded over 600 instances. The definition of "vocalisation" used at this stage was broad: any audible non-lexical vocalisation produced in conjunction with the ongoing physical activity qualified. However, after a preliminary analysis, we decided to restrict our focus to series of vocalisations in a single group training session with a personal trainer and three male clients, and the feature of change emerged as a key phenomenon of interest.
The sequential setting of the phenomena focused upon was analysed to establish how the participants oriented to the series of vocalisations and hence what kind of social order the series expressed (Sidnell and Stivers, 2012). Given that the status of vocalisations in interaction can be seen as "noticeable, yet off-record, perceptible yet ignorable" (Dingemanse, 2020:191), action on the part of either the vocaliser or a co-participant is necessary to make a vocalisation accountable.
The sequences to be studied were transcribed using multi-modal annotation, to the extent that we found to be necessary for the analysis and/or convenient for the presentation of results. In fact, since the activity studied is physical, a multi-modal transcription risks being overloaded. For this reason, we focused our annotation on those embodied actions that we assessed as relevant for the aim of the study. Given that the vocalisations in our data are non-lexical and hence non-conventional, the possible methods for transcribing them run along a continuum from annotation using the phonetical alphabet (IPA) to annotation using lay words such as grunt, puff and scream. We have chosen a middle way, mainly because we progressively became aware of the importance of a feature of change in the vocaliser that can be described as "more of the same": for example, an unmarked grunt followed by a grunt with more glottal pressure and longer vocal quality, making it louder. This feature is what we have tried to represent in transcriptions with conventional annotation and sound-imitating letter combinations, such as "euh"-> "EUH" for an vocalised outbreath (outbreath marked by the added "h") (cf. Wiggins's (2013) "eugh" for disgust or Pehkonen's (2020) "huh huh"; see also Keevallik and Ogden 2020 for a discussion on the relationship between form and function in vocalisations). In the descriptions provided as part of the analysis, though, we have sometimes used lay words for the sounds, and in one case we use the verb that the vocaliser himself used in a meta-comment on his own vocalisation (scream). The transcription symbols used are listed and explained in the Appendix. The English translations of the Swedish lexical material in the data have been made by the authors.
We consider gym workouts sensitive situations for both the clients and the professionals. In the case of personal training, the clients are expected to perform physically, and they are assessed and supervised. That the situation is sensitive to the clients makes it sensitive to the professionals as well, in that they must be both understanding and encouraging. Being filmed and participating in a research study might make the situation even more sensitive since the participants are aware of an observer's view. All participants, including those who accidently came along when we were filming, have agreed to contribute to the study. Their names and other details of their identity are anonymised in the transcriptions as well as in the presentations of results. The study has been approved by the ethical authorities in their respective countries (Finland and Sweden).

Analysis
In the following, three sequences are analysed. For each extract, we show (1) the features of the specific series of vocalisations; (2) how the PT makes change in the series accountable using interactional and social actions; and (3) how the interactional work on accountability can be explained using the notions of co-operative action and professional vision (Goodwin, 1994(Goodwin, , 2013. While our three extracts all share the second feature above e our proposal that change in a series of vocalisations is made accountable e they differ in how the change is realised and in what kind of interactional work is performed in relation to accountability. We begin with some ethnographic observations about the institutional context and the participants. Extracts 1e3 all derive from a single group training session in a gym where four men are working out together, mainly lifting heavy weights in work-out machines. One of them is a personal trainer (PT); the other three are paying clients ("Tom", "Jim" and "Ronny"). Personal trainers will typically only have a supervising role and do not work out themselves, but in this group the PT participates. He introduces each new exercise by performing it himself in a series of ten repetitions, while the three clients are watching. Then they perform the same exercise one at a time, and those who are not lifting watch the one who is (see Image 4.1 for a typical setting). Reynolds (2017) has studied situations of heavy lifting in gyms, noting that they tend to become spots where it is acceptable to gather and watch; it can be said that a situation of this type has been routinised in our four-man group.
The members of the group know each other quite well as training partners. They meet once a week to carry out a common training programme designed by the PT, and they all come to the gym regularly to train individually or to chat with other members or each other. The PT has an active career as an athlete, and he is trainer and mentor to the youngest client in the group, Jim, who is at the beginning of his professional athletic career. The PT has a pejorative nickname for Jim that is used frequently in front of the others. The oldest client is Ronny, who trains just to keep his body in trim. He has been a frequent visitor to the gym for many years. The third client, Tom, is middle-aged. The others' nickname for him identifies him as a typical office worker. The participants frequently use a teasing tone and a rude jargon during workouts, seemingly to push each other's physical performance. Most often, it is the PT who is joking and teasing, while the other three laugh.
Because of the clients' different backgrounds and ages, the group training activity also has different goals for each of them. To Jim, it is part of his training programme designed to help him develop professionally in his sport. For Ronny, the goal is to remain in good shape for his age, while the goal for Tom is to perform a weekly physical exercise.
In all three extracts presented here, the group are working out on the same machine, performing the same exercise ("reverse pec-deck flys"). At the starting point of each repetition, the exercising client assumes a position with the arms extended straight forward and a handle held in each hand. By moving the stretched arms backwards and outwards from the body, he then lifts the weights connected to the handle. Image 4.1 shows the climax position of the exercise, where Ronny has his arms stretched in the backward position. After this, his arms go back to their starting position held in front of him. Regular breathing co-ordinated with physical movement is a recommended way of taking on this kind of exercises: inhaling before lifting, exhaling in the most strained position and then relaxing (cf. Hagins and Lamberg 2006).
The extracts presented in the following all involve what we call series of vocalisations. Each series is co-ordinated with the respective client's performance of a physical exercise involving a set with a fixed number of repetitions performed without interruption. The serial vocalisations are regular, repetitive and often upgraded with different modalities towards the end of a set.

Extract 1: accounting for not fulfilling a series of vocalisations
Extract 1 below shows how the PT is correcting the client, making him accountable for not fulfilling a serial vocalisation. The PT recycles and meta-comments on material from the client's vocalisation and makes it accountable, thereby sharing a professional vision (Goodwin, 1994) with the group members.
The features of the series of vocalisations in this sequence are linked to the breathing pattern that the client, Ronny, manifests during the repetitive lifting. In conjunction with the last push of every lift, he exhales and relaxes before going back to the starting point and inhaling. The exhalations include a vocalisation e a prolonged "ts" sound on the outbreath e which is followed by a relaxation and an inbreath. The volume and the audible quality of the first six vocalisations are similar and the PT, Tom and Jim watch Ronny in silence (see Image 4.2).
The seventh vocalisation (line 01 in Extract 1 below) is louder (TS:) but during the eighth, ninth and tenth lifts, no audible vocalisations are produced (lines 03e09). This represents a change in the serial vocalisation to which the PT responds verbally with an encouraging assessment (good Ronny, line 04) (cf. Huhtam€ aki et al., 2019) and with some pushing (come on, line 05). The PT then makes Ronny accountable for not fulfilling a series in the turns where he recycles part of Ronny's vocalised material (line 08) and then meta-comments on it (line 11). This we will discuss further below. Extract 1. Group training session in a gym with a PT. PT ¼ personal trainer, male; RON ¼ Ronny, client; JIM ¼ Jim, client.
We see a pattern in our data with all co-participants orienting to changes in serial vocalisations. In this case, Ronny's audible repetitive vocalisations stop, causing the professional to encourage him in his performance (line 04e05). The co-participant Jim's subsequent instructing and evaluating actions (lines 06, 07, 10) align with the actions of the PT. This sequence can be explained with reference to Goodwin's (1994Goodwin's ( , 2013 concept of professional vision: in the PT's professional vision, the client's change in vocalising is interpreted as a signal related to the ongoing bodily performance. This change in vocalising behaviour elicits encouraging actions, which we take to mean that the interpretation made of that change within the professional vision is that it might be a sign of exhaustion in Ronny. Further, professional vision is also a concept reflecting inter-subjective understanding within a group; we see Jim's encouraging actions described above as aligning with the social organisation of the gym activity in the way indicated by the PT's professional vision. The encouraging actions by co-participants can be seen as incitements, whose function in sports settings is to make "recipients accountable to display effort" (Reynolds, 2021: 46).
In the subsequent lines of Extract 1, the PT produces a vocalisation (TS:, line 08) in overlap with Ronny's tenth and final lift, and then Ronny produces a screamy EYH (line 09), overlapping slightly with the PT's vocalisation TS:. The PT's vocalisation recycles Ronny's earlier ones from the first seven lifts of the series, which can be seen as an empathic signal showing his awareness that the serial pattern of vocalisations has been broken (cf. Weatherall et al., 2021:7). Figure 1.1 is taken at the moment after the PT vocalises a TS, and shows Ronny's climax position of his arms which is less strained backwards than during the first lift (cf Image 4.2) which might indicate tiredness. The PT:s vocalisation represents what Goodwin (2013) calls a co-operative action, where parts of earlier turns are reused and transformed to build new actions. Specifically, the PT reuses the TS:, he transforms it by producing it not in conjunction with his own bodily effort but rather in connection with Ronny's effort, and by doing so he builds a new social action (here, the action of correcting). However, while the PT highlights a vocalisation perceived as missing, Ronny responds by producing a different vocalisation: EYH in the final push of his lift.
The actions realised by the vocalisations become clearer in the subsequent turns, where the vocalisations are commented on. The PT meta-comments on how Ronny's series of vocalisations was not completed over the course of the tenrepetition series: you dropped your s's there Ronny, then there was an (0.2) EYH (lines 11e12), integrating a vocalisation at the end of the verbal syntax (cf. Keevallik, 2014;Lindstr€ om et al., 2020). In this turn, the PT makes Ronny accountable for his vocalising behaviour while at the same time accounting for his own vocalising. Ronny aligns with the meta-comment when he recycles and co-produces the EYH in overlap with the PT (line 12 and Figure 1.2). As Fig. 1.2 shows Ronny is here on his way to leave the machine, with his gaze on the floor in front of him not looking at the PT (who is standing in the left of the picture). With this position he also gives a verbal account for not fulfilling the series (line 12): we:ll (.) variety's important (literally, 'one must vary oneself'). This is an idiomatic way of generalising to close a topic (cf. Drew and Holt, 1988:412), and makes here a quite abrupt closing which gives a humorous effect. It shows that Ronny orients to the PT's recycling of the vocalisation and his meta-comment as a correcting action that he does not intend to take into further discussion for the moment.
The PT's recycling of the vocalisation and his meta-comment on it fulfil the function of highlighting an element (in this case, the co-ordination of outbreath and effort) as relevant to the ongoing activity, in the professional vision. In this way, the PT shares this knowledge with the other participants (cf. Goodwin, 2013). Jim orients to this by again taking on an encouraging (professional) role with an upgraded evaluation (very good, line 10; cf. line 07).

Extract 2: accounting for increasing the volume in a series of vocalisations
In Extract 2 below, we show how the PT is criticising the client, making him accountable for increasing the volume too far in a series of vocalisations. This criticism is produced as an imperative (stop) and thus more explicit than that in Extract 1, and it is followed by the recycling by the PT of a vocalisation. The increase in vocalising volume by the client is a change that is made accountable. However, it is highlighted as an object in a professional vision in a slightly different way than in Extract 1.
The serial vocalisation in this extract resembles that seen in Extract 1 in that the client e in this case Jim e produces a vocalisation for every repetition of a physical movement. However, the extract also contains a feature commonly seen in our data: a verbal "countdown". Here it should be noted that the training sessions are designed in a similar way. They all consist of several different exercises, each performed in two or three sets of 10e15 repetitions. In a countdown, the PT counts out the number of repetitions performed (or remaining). The counting typically does not start at the beginning of a set but tends to go on until the end of a set. Jim performs a round of lifting very heavy weights in the same work-out machine as in Extract 1. Before he begins his first set, both the PT and his training partners have been challenging him to lift heavier weights than he is used to. According to Reynolds (2017), a situation where someone challenges him-or herself is a situation where other participants can e or even should e watch in order to create a sense of membership in a gym group. In our data, the participants show their awareness of this obligation by turning physically to Jim's activity, watching closely even from the start of the set.
Jim takes up the challenge and actually manages several lifts, causing both the PT and the training partners to show slight surprise in frequent and repeated assessments during the first four lifts. Image 4.3 shows Ronny to the left, watching Jim. The PT and Tom are watching too but hidden among the training machines to the left.
At the end of each lift, Jim produces a groaning vocalisation introduced by a glottal stop, marking the final effort of each lift, and each repetition ends with a relaxation and an inbreath. Extract 2 begins with such a vocalisation at the end of the fifth lift (line 01) and the subsequent relaxation and inbreath (line 02). When Jim performs his sixth lift, the vocalisation pattern changes into a louder vocalisation (line 04) (realised as both an increased subglottal pressure and a longer vocal sound); this is where the PT starts a verbal countdown, four to go (line 06). In line 12, he produces his criticising imperative, and further on he again recycles a vocalisation (line 15). We will take a closer look at the interactional work performed here.
Extract 2. Group training session in a gym with a PT. PT ¼ personal trainer, male; JIM ¼ Jim, client; RON ¼ Ronny, client; TOM ¼ Tom, client.
The length and subglottal pressure of Jim's vocalisations progressively increase from the sixth lift (line 04), making the vocalisations increasingly audible, and in the eighth lift the vocalisation is realised with both more volume and a prolonged vowel sound (UE::H, line 10). This audible change prompts an imperative instruction from the PT which follows directly after a countdown element: two to go > stop making a noise< (line 12). Jim complies with e or at least reacts to e that instruction in his next (ninth) lift, not by being silent but by delaying his vocalisation. Figure 2.1 shows his most strained position in the ninth lift which, compared to his first lifts (Image 4.3) is less reversed. Only after the lift, he produces a vocalisation, during the relaxation phase (lines 13e14). This time the vocalisation has a different vocal quality which makes it sound more like a scream OEA::H (line 14).
The instruction from the PT is oriented to as a joke or as teasing by the two training partners, who laugh (line 13). Still, Jim complies in some way, which shows that he orients to the PT's criticism as serious. This might reflect that the PT is a mentor to the much younger Jim, and that the teasing jargon is part of this mentoring relationship. When Jim starts his next lift in silence this can be seen as an orientation to having been made accountable for his sound by the PT. Then Jim resumes his vocalisation in an even louder way, this can be interpreted as an orientation to a tone in the PT's instruction which is both serious and joking at the same time. The laughing from Ronny and Tom, and Jim's first complying and then resuming his vocalisation, can be explained as a co-produced action where the professional vision (focusing criticism) in the local context is oriented to and negotiated among the members.
The PT continues his countdown by saying one more, and Jim takes on his last lift in silence (line 15). He has obvious problems performing this tenth and final repetition, and a screamy high-volume vocalisation produced by the PT seems to give voice to Jim's efforts (EE:E, line 15 and Figure 2.2) (cf. empathic sounds in Weatherall et al. (2021); sounding others' sensations in . Lines 16e17 mark the end of the set: the PT asks for one more, one more, but Jim gives up, saying na:h na:h on an outbreath, leaning against the machine.
In Extract 1, we analysed a vocalisation by the PT as "filling in" when the client Ronny changed, or digressed from, his expected vocalising pattern. A similar phenomenon in fact occurs in Extract 2, when the PT produces a screaming sound at the point where a vocalisation could be expected in the light of Jim's previous behaviour during this set of repetitions (line 15). In the same way as in Extract 1, the PT recycles a vocalisation, this time transformed in terms of vocal quality and surrounded by an encouragement repeated three times: one more (lines 15e16). Following Goodwin's ideas about the co-operative nature of action, this can be interpreted as an action involving close co-participation where the PT gives voice to Jim's physical effort. The recycling also constructs a new social action tied to what has come before: the action of pushing Jim to perform the last lift in the series of ten. However, Jim gives up halfway through the lift (line 15) and gives a verbal account for this: na:h na:h (line 17).
We see the upgrading in the series of vocalisations as a co-constructed social action, where timely and frequent assessments (see also Keevallik, 2014;Huhtam€ aki et al., 2022) and incitements (Reynolds, 2021) in the first lifts are linked to an upgrade in the volume and quality of the vocalising. When there is a change in the series of vocalisations, the PT begins his countdown (line 06), the other participants fall silent, and the PT takes over the upgrading activity.
The PT's imperative instruction to stop making a noise and his recycling of a vocalisation actually highlight the serial vocalising from the perspective of the professional vision. The increased volume of the vocalising is a change in the series that, from a professional vision, constitutes an object of knowledge e indicating that something has changed in the physical performance as well. The change in the series of vocalisations is made accountable by the PT, who by doing so makes the professional vision accessible to the training partners as well. What is more, the actions connected to the professional vision also reflect the goal of the activity, which for Jim e a young man working to be a professional athlete e is to perform the whole set of ten lifts.

Extract 3: accounting for not complying with the PT's instruction
In Extract 3 below, we show how the PT is instructing the client, thereby making him accountable for changing the series of vocalisations that he has started. In this extract as well, the vocalisations are linked to the serial feature of the exercise and of the associated breathing. The PT's instructions are produced as imperatives pertaining to breathing, which the client e here Ronny again e does not comply with, causing the PT to give an evaluating meta-comment, including a recycling of a missing vocalisation, after the last lift. We interpret this as the highlighting of an object in a professional vision, in an evaluating metacomment made after the exercise is finished (Goodwin, 1994).
The exercising client, Ronny, is carrying out a new series of repetitions in the same work-out machine. Before the extract starts, he has made six lifts in the set, each accompanied by audible non-glottal puffing breathing (cf. Extract 1 and Hagins and Lamberg, 2006). However, during the seventh lift (line 01), his regular audible breathing during lifting is no longer present. After the climax of this (silent) lift, the PT makes a positive assessment, good (line 02) (cf. Huhtam€ aki et al., 2022), and gives an imperative instruction to breathe: breathe (also line 02). The PT has apparently noticed the change in breathing pattern over the course of Ronny's exercise. Ronny goes on to perform his eighth and ninth lifts, still without any audible inbreath, and the PT instructs him again, this time by saying push and telling Ronny to take one at a time (line 06). In the tenth lift, Ronny does vocalise again, this time simultaneously with the most strenuous part of the lift, but with a vocalisation that is not in line with the expected series (E:H, line 07). Finally, in lines 12e17, a verbal interaction between Ronny and the PT takes place, showing their orientation to how the exercise e and the vocalising e was performed.
Extract 3. Group training session in a gym with a PT. PT ¼ personal trainer, male; RON ¼ Ronny, client; JIM ¼ Jim, client Ronny complies in some way with the instruction given in line 02, since he breathes out twice, audibly but without voice, before beginning his eighth repetition (line 03). However, that repetition and the next (ninth) one are both performed with outbreaths after, not during, the strained position in the lift (lines 03e07). At the end of the ninth lift, the PT produces two imperative encouragements: come on Ronny (line 05) and push (line 06), as well as a phrasal instruction (cf. Huhtam€ aki et al., 2019) reminding his client to approach the repetitions one at a time (line 06). During the tenth and final lift, Ronny produces a vocalised screamy roar, E:H (line 07), this time simultaneously with the most strenuous part of the lift. This climactic vocalisation marks the end of the exercise and is also overlapped by a positive assessment from the co-client Jim (good Ronny, line 08). Following the last lift (and the screamy vocalisation), the PT again instructs Ronny to breathe with an imperative, breathe (line 09). Ronny then again makes an audible outbreath (line 10), and Jim repeats his positive assessment: good Ronny (line 11).
What we see here is an ongoing instructive action on the part of the PT, who repeats the imperative breathe twice (lines 02 and 09) and also produces pushing and instructive actions in lines 05e06. We argue that the instructive action (line 02) is prompted by the fact that Ronny has stopped breathing audibly, thereby making a change in the expected series of sounds. The PT's actions can be explained with reference to the concept of professional vision: to the PT, the regular breathing and co-ordinated audible vocalisations characterising the beginning of Ronny's set represent signals of correct bodily performance; consequently, when the audible breathing stops, this signals trouble with bodily performance. This analysis of events is also made accessible to Jim by way of the professional vision, meaning that Jim's positive assessments (lines 08 and 11) can be interpreted as having an encouraging function.
After an additional vocalised outbreath sh: (line 12), Ronny meta-comments on his own vocalising as he gets up from the training machine (Figure 3.1): I screamed instead of breathing (line 12). This meta-comment shows that he accounts for his failure to follow the PT's instructions to breathe. In lines 13e16, the PT elaborates on Ronny's meta-comment, thereby verbally taking on the professional role: you held your breath the last three reps, you .h (1.0) h (lines 15e16). The elaboration starts with a summons, but Ronny (line 13) to which Ronny answers yes and turns to the PT (line 14e15 and Figure 3.2), indicating that he is listening to the instructions. Specifically, the PT recycles Ronny's deviant behaviour by breathing in, holding his breath and breathing out (line 16). Ronny confirms having heard the elaboration by saying ye:s yes and breathes out loudly sh (line 17).
The PT:s recycling of "holding one's breath" (line 16) is transformed into a new interactional setting, and e as in Extracts 1 e it is performed outside the physical performance, since the exercise is already finished when it takes place. The recycling and the transformation build up a social action of evaluating the client's performance by offering a negative example. This means that, in this extract as well, the vocalising behaviour e or here rather the failure to vocalise e is highlighted in the interaction, making it accessible as perceived in a professional vision not only to Ronny but to all of the group members.

Discussion and conclusion
In sum, the extracts presented above show a similar pattern, namely that changes in clients' series of vocalisations are made accountable by the PT. The social actions performed by the PT in doing this are different across the three sequences: correcting, criticising and instructing, respectively. Even so, one feature common to all of these actions is that the PT recycles material from the clients' vocalising, transforms it and builds new social actions, also accompanied by verbal actions. We argue that the interactional work performed here by the PT highlights observable aspects of the exercise performance, which reveals a professional vision to the group members, thereby helping them to become competent members and actors in the ongoing activity.
Our analysis shows that, in the specific institutional setting of gym training, vocalising has a potential for the PT as an input tool to monitor bodily performance and for correcting, criticising and instructing. We argue that, in the group training sessions, a professional vision on changes in series of vocalisations is shared among the members when the PT highlights those changes as accountable in interaction. According to Keevallik and Ogden (2020), sounds connected with straining and lifting are phonetically under-specified when it comes to explaining what is going on in the vocalising body, and this is part of what makes them useable in interaction (Keevallik and Ogden, 2020; cf. also Dingemanse, 2020 on liminal signs): resources with limited semantic content are relatively free to use and interpret locally.
In a recent special issue  bring together studies with a common analytic focus on "sounding for others". In our analysis, the PT's recycling of "sounds" could be described as sounding on behalf of the clients, since he orients to changes in series, and sometimes to "missing" sounds. This could be explained in terms of a professional vision showing the right way.
In our extracts, what is made accountable are the changes in series of vocalisations: changes such as stopping vocalising, vocalising in the wrong embodied environment or vocalising too loudly. These changes are all related to the expectations emanating from the fact that the exercises e and the vocalisations co-ordinated with them e should be performed in series. Our highlighting of changes in vocalising behaviour in our data can be seen as prompted by the under-specification of each separate vocalisation in the local context. It is their serial nature that is oriented to by the participants. To make sense in the institutional context, those (changes in) vocalisations need to be in some way unpacked e in our case into meaning-making objects in relation to a professional vision.
We argue that our study has relevance for interaction research and that our use of Goodwin's notion of professional vision (2018) makes our findings relevant for practitioners in similar settings by shedding light on how the interplay of human resources can be made meaningful. For clients, the vocalising can be a way of manifesting their commitment or effort to their trainer and thus to acknowledge that professional's execution of his or her institutional role. For the trainer, the vocalising can be a resource to draw upon when evaluating the ongoing activity. The embodied and verbal sharing of a professional vision on e in our data e the co-ordination of breathing and effort makes objects of knowledge available to all the members of the group. Also, making the clients' vocalising behaviour accountable in the situational interaction is a way of inviting the clients to participate in the social construction of the context of physical healthecare activities.
In our introduction to this paper, we mentioned that empirical studies of vocalisations can contribute to the understanding of the interplay of different resources in interaction and, in our case, specifically institutional action. In real life, in all kinds of interaction, this interplay is often unconscious: people use their bodies, they talk and they vocalise without asking the question of which interactional resources contribute what to their interaction or whether those resources are lexicalised or not. The aim of posing the academic question of how multi-modal resources interplay with each other is to get a glimpse of the communicative competence that people manifest in everyday life.

Data availability
The data that has been used is confidential.

Acknowledgements
This work was supported by The Bank of Sweden Tercentenary Foundation [grant number M12-0137:1]. The funding source had no involvement in study design. The authors thank Jan Lindstr€ om and Jenny Nilsson for valuable comments on the paper, as well as two anonymous reviewers.

◊-->
Action continues on subsequent line(s) ->◊ until the same symbol is reached -->> Action continues after transcript's end