The Language of Interpersonal Interaction : An Interdisciplinary Approach to Assessing and Processing Vocal and Speech Data

Verbal and non-verbal information is central to social interaction between humans and has been studied intensively in psychology. Especially, dyadic interactions (e.g. between romantic partners or between psychotherapist and patient) are relevant for a number of psychological research areas. However, psychological methods applied so far have not been able to handle the vast amount of data resulting from human interactions, impeding scientific discovery and progress. This paper presents an interdisciplinary approach using technology from engineering and computer science to work with continuous data from human communication and interaction on the verbal (e.g. use of words, content) and non-verbal (e.g. vocal features of the human voice) level. Text-mining techniques such as topic models take into account the semantic and syntactic information of written text (such as therapy session transcripts) and its structure and intercorrelations. Speech signal processing focuses on the vocal information in a speaker’s voice (e.g. based on audioor videotaped interactions). For both areas, an introduction defining the respective method and related procedures, and sample applications from psychological publications complementing or generating behavioral codes (e.g. in addition to cardiovascular indices of arousal or as a form to encode empathy) are provided. We close with a summary on the opportunities and challenges of learning and applying tools from the novel approaches described in this manuscript to different areas of psychological research and provide the interested reader with a list of additional readings on the technical aspects of topic modeling and speech signal processing.

Verbal and non-verbal data streams are not specific to psychotherapy but are common to most interpersonal interactions, which are a focus of study across a wide array of scholarly disciplines.Moreover, researchers use the tools they know and are often unaware of novel methodologies developed in other disciplines.The lack of interdisciplinary collaboration on methods for studying interpersonal interaction has slowed the pace of scientific discovery.In addition, it likely contributes to the propagation of systematic errors associated with the limitations of current methods.Within psychology, the most common method for studying interpersonal interaction is observational behavior coding.In this method, behavior is quantified by human coders (sometimes called annotators or raters) according to rules developed by investigators to capture the essential aspects of the dyadic interaction.Using this methodology, trained coders watch or listen to a dyad's interacting and score them on dimensions defined by the coding system.However, behavioral coding has a number of shortcomings (Baucom & Iturralde, 2012): (a) it is time consuming, often involving months of training prior to the actual work; (b) it can be error prone as human coders do not always agree with each other on coding decisions; (c) behavioral coding does not 'scale up' to larger samples due to time constraints; (d) behavioral coding systems often do not translate cross-culturally (Zimmermann, Baucom, Irvine, & Heinrichs, 2015), impeding replication and generalization of findings; and (e) the coding systems are simplifications of the true complexity of interactions because raters are limited in the quantity and temporal specificity of factors they can observe and code during a dyad's interactions.What is needed is a new set of methods for studying interpersonal interaction that allow us to move beyond these limitations.
Fortunately, methods and tools to work with complex linguistic data exist within engineering and computer science, falling broadly within the categories of speech signal processing and statistical text-mining and natural language processing (Busso, Lee, & Narayanan, 2009;Gaut, Steyvers, Imel, Atkins, & Smyth, 2017).At a fundamental level, signal processing techniques use computer algorithms to derive "informative" quantities from highly multivariate, continuous streams of input.One of the tremendous advantages of signal processing methods is that they can be used to process vast amounts of information far beyond what an individual rater could garner from watching a dyadic interaction.In much the same way that observational coders are trained to recognize and rate classes of behaviors, speech signal processing and statistical text-mining use algorithms and models to estimate mathematical quantities, called features, that characterize aspects of the original signal, either voice or text.Some features provide a psychologically meaningful measurement of behavior in and of themselves (e.g.vocally encoded emotional arousal measured by the fundamental frequency [f 0 ] of the speech sound wave) while other features can be combined to recognize social behaviors of interest with statistical techniques.Thus, one application of this methodology has been to replicate human coding, but using only acoustic and text inputs.For example, working with transcripts of drug addiction counseling sessions, these methods have led to automated coding of therapists' behaviors during motivational interviewing, including behaviors such as simple and complex reflections, open and closed questions, and affirmation (Atkins, Steyvers, Imel, & Smyth, 2014;Can, Georgiou, Atkins, & Narayanan, 2012).Similar advances have been realized using acoustic data for automated coding of spousal behavior during discussions of a difficult relationship problem (e.g. Lee et al., 2010Lee et al., , 2014)).These methods not only represent significant methodological advancements, but also open up new avenues for theoretical refinement and advancement.These new methodologies overcome some of the inherent limitations in behavioral coding and allow researchers to ask familiar questions in new ways and to ask entirely new questions.

Method
Based on the international and interdisciplinary work and collaboration during the VolkswagenStiftung's summer school "The language of interaction," the goal of this manuscript is to provide a general overview of speech signal processing and statistical text-mining methods that have the potential to exponentially increase our ability to understand and conduct research on important dyadic interactions.The methods are introduced using example applications from two types of interpersonal interaction that are central to psychology research: (a) partners within a committed romantic relationship, and (b) patients and therapists in a psychotherapeutic context.Our goal is to introduce the methodologies; however, the current article is not intended as a tutorial, which would be beyond the scope of a single article.Throughout, we point to additional resources to assist learning more about the methods.The current article is also not intended to present a comprehensive review of existing work using these and other allied methods.We close by proposing a research agenda for further integrating these methods into psychological science and clinical practice.
This manuscript only covers research using the methods mentioned above and meeting the following criteria: with explicit definitions for each category, and human raters manually code the text according to these categories.These inductively derived categories can inform treatment process (i.e.what actually occurs during therapy) and serve as predictors of treatment outcomes (Svartvatten, Segerlund, Dennhag, Andersson, & Carlbring, 2015).However, the significant disadvantage of content analysis, and any method relying on human judgment, is that it is difficult to scale up to large collections of transcripts.One popular method to automatically analyze text is Linguistic Inquiry Word Count (LIWC; Pennebaker, Booth, & Francis, 2007), a program that recognizes 2,300 words and classifies them within 70 predefined classes (e.g.negative and positive emotion words).LIWC has been used to study emotional, cognitive, structural, and process components present in verbal and written speech (Tausczik & Pennebaker, 2010).At the same time, LIWC ignores the context in which the words occur, and because the words and categories are fixed, there is no ability to adapt to a particular domain.
A more recent approach to analyzing the text in psychotherapy transcripts is based on topic models (Atkins et al., 2012;Atkins et al., 2014;Imel, Steyvers, & Atkins, 2015).Topic models (also called latent Dirichlet allocation) are a type of latent mixture model, in which the model identifies groups of words that reliably co-occur across therapy sessions (or "documents", more generally).The model represents the observed transcripts as mixtures of various underlying topics.For example, a transcript from a drug intervention might be represented as a mixture of a drug use topic and a rehabilitation topic where words such as drug or problems are highly likely to be expressed in the drug use topic, while words such as healing and change are more likely to be expressed in the rehabilitation topic.The resulting topics can be used to identify themes in text and also as data reduction of text for predictive modeling.Topic models share similarities with other dimensionality reduction techniques such as Latent Semantic Analysis (Landauer & Dumais, 1997), cluster analysis, or principle component analysis but were specifically designed to create interpretable dimensions when applied to text.This is advantageous when the goals are not just dimensionality reduction or prediction but also interpretation of the underlying patterns of the linguistic data.
Topic models applied to psychotherapy transcripts have sometimes been used for exploratory data analysis, where the goal is to summarize, explore, and discover the types of topics that are discussed (called unsupervised learning in the machine learning literature; Hastie, Tibshirani, & Friedman, 2009).Alternatively, they can also be used to predict some variable of interest based on the text in the transcript, such as behavioral codes or treatment outcomes (called supervised learning in the machine learning literature).

Topic Models for Exploratory Data Analysis
In the unsupervised learning case, the model associates individual words with latent topics by analyzing the statistical co-occurrences between words across transcripts.Words that tend to co-occur with other words tend to be placed in the same topic.For example, Table 1 shows topics inferred from a large randomized clinical trial of two behaviorally-based couple therapies (Atkins et al., 2012).The table shows a few illustrative topics the model inferred and the 15 words that were most frequently assigned to each topic.Like factor analysis, the topic names (in bold typeface) are labels supplied by the authors and are not automatically learned by the model.The topic model also produces a distribution of topics over sessions (or whatever defines the basic textual unit of analysis).With these probability distributions, it is possible to investigate temporal changes in topics over sessions or identify individual talk turns with specific content (e.g.particular interventions or important topics such as drug and alcohol use).derived semantic summaries where each point represents a single session.Sessions conducted using the same treatment type tend to be linguistically similar, although, there was some variability within each treatment type.The medication management session outlined by a black circle was more similar to Humanistic/Experimental therapy or CBT than other medication management trials.Inspection of the transcript revealed that there was no direct discussion of medication or dosage and that the session was more focused on providing psychotherapy.This demonstration shows how topic models can be used to compare psychotherapy session across treatment type.

Topic Models for Behavioral Coding
An extension of the basic, unsupervised topic model is called a labeled topic model and can be used to predict behavioral codes or content discussed in treatment (Atkins et al., 2014;Gaut et al., 2017).The topics in the model are placed in correspondence with behavioral or content codes (e.g.subject or symptom) or latent background codes that do not correspond to any of the observable codes.The model learns which words are associated with each topic and can infer which content or behavioral codes are representative for individual utterances, talk turns, or sessions.
For example, Table 2 shows output from a labeled topic model applied to the same psychotherapy corpus used by Imel et al. (2014).However, for this analysis, 209 content or symptom codes were included in the model fitting process (Gaut et al., 2017).The model learned words that are indicative of each code and even found unexpected semantic relationships between words and codes, such as grouping the words 'dishes' or 'cats' in the topic corresponding to a code about irritability.This model included latent background topics to capture lin-Language of Interpersonal Interaction 74 guistic variation not specific to any one code, and these topics correspond to broad concepts such as family, work, home, fitness, and sleeping routine.Gaut et al. (2017) also demonstrated that the labeled topic model can find local information within a session that is associated with a session-level code or tag.For example, from a session tagged with a 'suicidal behavior' code, the model was able to disentangle talk turns that were specifically associated with suicide from talk turns associated with other unrelated topics.Comparing the labeled topic model to human-produced codes using area under the ROC curve (AUC), the labeled topic model predicted subject and symptom codes, showing strong concordance with human-produced codes.Weusthoff, Gaut,Steyvers et al. 75 The previous study examined generic codes, focused on common discussion topics or patient symptoms that might occur in psychotherapy sessions.Atkins et al. (2014) examined the utility of a labeled topic model to generate a much more specific type of observational code, fidelity codes for motivational interviewing (MI).They trained the model using 148 sessions from five MI intervention studies that were transcribed and coded using the Motivational Interviewing Skills Code (MISC; Miller, Moyers, Ernst, & Amrhein, 2008).The studies were heterogeneous in focus including alcohol, marijuana, and poly-drug targets.The model learned topics for 12 MISC behavioral codes, as well as for each of the five different studies (see Table 1 for examples).The topics for MISC behavioral codes and specific studies generally have face-valid interpretations.For example, topics corresponding to asking questions contain phrases such as 'have you', 'questions', 'would you', and 'are you' and topics for the ESPSB study that focused on spring break drinking included words related to partying during spring break such as 'break', 'spring', 'drinks', and 'spring break'.Table 3 shows the most probable talk turns for a sample of MISC behavioral codes.reflections, affirmations, structure, and empathy.The model performed significantly better than chance guessing and generally performed better when codes are tallied across sessions as opposed to at each individual talk turn.On several codes (e.g.complex reflections, information giving), the model reliability was comparable to human reliability but for other codes, human reliability was significantly better than model performance.This suggests that the labeled topic model can be competitive with human raters for some codes, but that other codes may be more difficult to capture in a topic modeling framework.

Future Directions in Text Analysis of Dyadic Interaction
Topic models primarily focus on word co-occurrence within documents (e.g.sessions, talk turns) and ignore any temporal dependence between words.Statistical text-mining for psychotherapy transcripts could be improved by incorporating information about temporal and syntactic structure.Recent neural network methods attempt to model this dependence by modeling words as high-dimensional vectors whose representation depends on surrounding context words (Mikolov, Corrado, Chen, & Dean, 2013), as well as syntactic information (Socher, Bauer, Manning, & Ng, 2013).Other methods model words as high-dimensional probability distributions where semantic meaning can be captured by set relations with other words (Vilnis & McCallum, 2015).These methods have shown promise in several natural language processing tasks such as sentiment analysis but have yet to be applied to psychotherapy transcripts.

Speech Signal Processing
Whereas, statistical text-mining primarily addresses "what was said" in a conversation, psychologists have commonly used observational coding to also measure "how it was said."The application of observational coding systems needs to take into account a number of different aspects of the interaction, such as the communication setting, target population, or specific behaviors displayed by the interaction partners on different levels (i.e.micro-versus macro-analytic coding; for further details see Humbad, Donnellan, Klump, & Burt, 2011), adding to the complex nature of the observational coding practice itself.However, what unites coding systems focusing on non-verbal, vocal aspects of interpersonal interactions is the attempt to capture the paralinguistic components of spoken language by taking into account vocal aspects such as voice tone, loudness, or intonation.
Other disciplines have been developing computational methods for quantifying information communicated in the voice, but these methods have only recently been applied in psychological research.The emerging field of behavioral signal processing (BSP; Narayanan & Georgiou, 2013) is developing methods to decipher the complex and heterogeneous chain of events that comprises human behavior and transform it into its basic signal components, making it analyzable using tools from engineering and computer science.Speech signals expressed in the human voice are especially informative as they convey both the content of a message (verbal/ linguistic information) and paralinguistic information about the psychological state of the speaker using non-verbal or vocal cues (e.g.prosody; Juslin & Scherer, 2005).
In order to communicate with another human being, a speaker needs to generate a message in a form that can be understood by others.In the case of speech production, the neuro-muscular controls start and coordinate the interplay of respiratory, phonetic, and articulatory muscles in the vocal tract system to provide the anatomical and physiological sources for generating a continuous acoustic speech sound wave with the desired and / or necessary characteristics for the environment the individual is communicating in (e.g.appropriate loudness).
The recipient of such a message processes the acoustic waveform of the speech signal via the basilar membrane and resulting neural transduction into its discrete features, which are then coded into phonemes, words, and sentences that can be interpreted and understood by the listener (Narayanan, 2014).
For speech signal processing purposes, the discrete features of the continuous speech sound wave comprising the physical basis of any spoken message are analyzed.To be able to extract this information, the speech sound wave is periodically and repeatedly segmented and analyzed with regard to a feature of interest, yielding Weusthoff,Gaut,Steyvers et al. 77 a numerical index for each segment of analysis.One such example could be vocal fundamental frequency, or f 0 , scores (perceived as the voice pitch of a speaker) for any given time period of interest.A number of other vocal features based on different aspects of the human voice can be analyzed as well (Juslin & Scherer, 2005, for a review).Research in dyadic interactions, however, has focused on f 0 as one of the most emotionally salient aspects of human voice (Busso et al., 2009), and, thus, presents the scope of this manuscript.

Behavioral Signal Processing of Vocally Encoded Emotional Arousal
One potential application of speech signal processing results from the Component Process Model of Emotion (CPME; Scherer, 2009).The CPME views emotions as recurring, prototypical, and adaptive reactions for goalachieving behavior (Buss, 2005) that influence various bodily systems physiologically, among them the human voice (Juslin & Scherer, 2005) and its acoustic features and related signals, such as f 0 .
According to the CPME, an individual evaluates an internal or external event and its consequences on different levels by the interaction of cognitive functions (termed cognitive appraisal).Cognitive appraisal reflects the individual's subjective perception of a situation, object, or event.If the event is appraised as being meaningful for an individual's goals and / or motives, emotional processes (including motivational changes) occur and lead to physiological changes in the autonomic (e.g.cardiac or respiratory system) and somatic (motor-driven expressions in face, voice, or body posture) nervous system.The pattern of activation in these systems that arises during an emotional episode is influenced by numerous factors including the valence (i.e., degree of pleasantness vs. unpleasantness) and arousal (i.e., degree of activation vs. deactivation) of the emotion.
Emotional arousal is expressed through a number of channels such as facial expressions, word usage, and prosodic features of the voice (e.g.fundamental frequency; for a more detailed description and empirical findings concerning the CPME; please see Scherer, 2004Scherer, , 2005)).Vocal fundamental frequency is an acoustic property of the human speech sound wave that can be assessed mechanically and, thus, analyzed and interpreted objectively.During phonation, the first phase of human speech production, air is released from the lungs.The outward flow of air passes over the vocal folds in the larynx, which can be positioned and flexed by muscles under voluntary control.The different levels of tension in the vocal folds lead to vibrations in the passing air (Juslin & Scherer, 2005).The lowest harmonic frequency of these patterns is called (vocal) fundamental frequency (or f 0 ), measured in cycles per second (Hertz, Hz) over the time period of interest.Higher tension in the vocal folds corresponds to higher vibration rates and to higher f 0 values (Weusthoff, Baucom, & Hahlweg, 2013).On the auditory level, perceived voice pitch is analogous to fundamental frequency, and higher f 0 is perceived as a higher voice pitch (Frick, 1985).
Emotions and emotional arousal displayed in the human voice (Busso, Lee, & Narayanan, 2009) seem to be an especially flexible tool in passing on information about an individual's result of appraisal (e.g. in the case of sensing danger in one's environment; Juslin & Laukka, 2003).Since the human voice is one of the sounds most often experienced by human beings in life (Belin, Zatorre, & Ahad, 2002), it is very useful in studying human interactions per se, and especially important in psychological research as it provides a continuous stream of information that is interpreted and acted upon by both interaction partners.The sample consisted of N = 67 severely distressed, heterosexual German couples participating in couple-relationship education (namely, EPL -Ein Partnerschaftliches Lernprogramm; Hahlweg, Markman, Thurmaier, Engl, & Eckert, 1998).
As expected, positive associations emerged between f 0 and physiological variables (in this study, heart rate, blood pressure, and cortisol): Higher f 0 range was associated with higher levels of all physiological indicators of emotional arousal.Also, communication behavior was significantly related to f 0 range.Namely, higher levels of f 0 range were associated with higher levels of self-reported negative communication behavior.For observed communication behavior, higher levels of f 0 range were linked to higher levels of negative communication behavior and lower levels of positive communication behavior.Additionally, simultaneous examination of physiological variables and observationally-coded communication behaviors revealed that associations between both sets of variables and f 0 range were largely independent of one another.Although male and female f 0 range were significantly different from each other (with females having higher f 0 range values than males), no significant gender differences emerged in any of the predictors of interest associated with f 0 range mentioned above.
This collection of findings suggests that f 0 range may be most reasonably interpreted as a vocal distress signal during couple conflict, independently reflecting both socially learned ways of interacting with others (communication behavior) and basic processes in physiological responding (heart rate, blood pressure, cortisol).The findings demonstrate that these associations reflect variability in response due to a particular conflict, a general style of responding during conflict, and individual differences.

Behavioral Signal Processing for Behavioral Coding
Another application of speech signal processing is the use of BSP methods to generate observational coding data.Using BSP methods to generate observational coding data from acoustic features is similar in many ways to the labeled topic models described above.In both cases, a large number of input features (e.g.words, f 0 ) are used to predict what score an observational coder would give to an interaction using a supervised learning model.The primary distinction we are making here is that it is possible to use linguistic features, acoustic features, or a combination of the two for this purpose.
One way that acoustic and linguistic features differ is in the methods used to extract features.Acoustic features are generally derived from continuous waveforms and the methods used to extract features from this type of data, therefore, reflect the unique properties of continuous waves.This process is analogous to how heart rate (a feature) is extracted from an ECG recording (a continuous signal).Heart rate is typically measured as the number of R-spikes that occur during a 60 second window where an R-spike is defined as a most prominent, upward inflection that occurs between two smaller upward inflections and a downward inflection.In much the same way, there are numerous acoustic features that can be used to index spectral, prosodic, and vocal quality aspects of recorded speech.
BSP for observational coding works by extracting a large number of acoustic features and "learning" in what way those features are related to observational coding data.For example, researchers have shown that it is possible to predict codes from both the Couple Interaction Rating Scale 2 (CIRS-2; Heavey, Gill, & Christensen, Weusthoff, Gaut, Steyvers et al. 79 2002) and the Social Support Interaction Rating System (SSIRS; Jones & Christensen, 1998) using BSP methods with acoustic features (Black, Georgiou, Katsamanis, Baucom, & Narayanan, 2011).CIRS2 and SSIRS codes are used to quantify spouses' behaviors during problem-solving discussions and include ratings of positive and negative behaviors and affect.A comparison of the resulting BSP-generated codes and human-generated codes showed an average accuracy of 75% for rating wives' behaviors and 73% for rating husbands' behaviors, with even higher accuracy for specific codes.Negativity in husbands was rated with 86% accuracy, for example.Advances continue to be made in the use of BSP for such complex codes, and accuracy will only improve as more research is done in this area.

Future Directions
We have provided an application-focused overview of statistical text-mining and speech signal processing, highlighting how they can be used to tap similar types of information as behavioral codes and even be used as methodologies to generate behavioral codes.In this final section, we discuss key considerations in using these methodologies in psychological science, as well as future directions broadly.
No methodology is ideal for all applications, and there must be thoughtful, theory-based applications of these methodologies within psychological science.For example, Imel et al. (2014) used f 0 to study the process of empathy in motivational interviewing sessions and used a coherent theoretical understanding of empathy to do so.
In particular, the perception-action model of empathy (Preston & De Waal, 2002) has emphasized physiological synchrony as a core component of empathy.Thus, Imel et al. (2014) used f 0 as a marker of vocally encoded arousal and demonstrated that it is more highly correlated (across one-minute intervals and entire sessions) within MI sessions rated as highly empathic versus those that were rated low on empathy.Baucom, Atkins, andChristensen (2010, as cited in Baucom &Atkins, 2012) also examined covariation in f 0 to examine a theoretically derived, interpersonal emotional process, in this case interpersonal emotion regulation.Baucom et al. (2010) examined covariation in spouse's f 0 while they were discussing an area of disagreement in their relationship.
Polarization theory (e.g.Baucom & Atkins, 2012) suggests that a key characteristic of relationship distress is difficulty regulating emotion during relationship conflict.Consistent with this idea, Baucom et al. (2010) found that stronger cross-partner associations in f 0 were associated with poorer interpersonal emotion regulation.
Considering these examples jointly demonstrates that while there are significant cross-person associations in f 0 during many, if not most, interactions, these associations do not represent the same process.This example illustrates that these methodologies can be powerful new tools for studying human behavior and dyadic interaction, but as always, they must be used within the context of psychological theory.
Beyond theoretical consideration, who can use these methods?Is it reasonable to think that psychologists could effectively learn these methodologies on their own, or do they require collaborating with colleagues from engineering and computer science?Our informed opinion is that both are possible, though the latter is prefera-

Summary
More than one hundred years ago, John Watson made a powerful argument for the value of new methods for studying behavior in noting that: As our methods become better developed it will be possible to undertake investigations of more and more complex forms of behavior.Problems which are now laid aside will again become imperative, but they can be viewed as they arise from a new angle and in more concrete settings.(Watson, 1913, p. 175) The most exciting aspect of the methods introduced in this manuscript is their potential to fulfill Watson's vision.
Continued development and advancement of these methods will be challenging and necessarily involve tight interdisciplinary collaboration.However, such advancement has tremendous potential to make break-through discoveries because it would enable the field to address previously intractable problems and to do so in rich and thoughtful ways.For example, these computational methods have the potential to be used to monitor adherence to psychotherapy protocols in healthcare networks (e.g.Imel, Steyvers, & Atkins, 2015) and to provide private practice clinicians with a means for objectively assessing treatment progress and outcomes (Baucom & Iturralde, 2012).
One important element of interdisciplinary collaboration on these methods is sufficient understanding to the mechanics involved in the methods themselves.As noted above, it is not necessary for psychologists to be able to develop the computer code or mathematical algorithms underlying these methods, but knowledge of how the methods work increases a psychologist's ability to collaborate effectively.With this idea in mind, we close with a list of references for further technical readings related to the computational methods that were introduced in this manuscript.
Imel et al. (2014) used topic models to compare the linguistic similarity in psychotherapy sessions (N = 1,318), comparing four different types of treatment (Medication Management, Psychodynamic therapy, Cognitive-Behavioral therapy, Humanistic/Existential therapy).An unsupervised topic model was used, and the resulting topics were used to quantitatively summarize the semantic content of each session, which in turn was used to compare each session with every other session.Figure1shows a multi-dimensional scaling of the topic modelWeusthoff, Gaut, Steyvers et al.  73

Figure 1 .
Figure 1.Multidimensional scaling of 1318 sessions in a 200 topic space.Colors correspond to different treatment approaches.One outlier medication management session is circled in black.Adapted from Imel et al. (2014).
Atkins et al. (2014) tested the ability of the model to predict behavioral codes by computing scores for each talk turn in each session that corresponded to MISC behavioral items.They assessed the ability of the model to predict codes assigned by raters and compared model performance to human reliability.To determine predictive ability,Atkins et al. (2014) computed the area under the curve (AUC) for each code.The model coded talk turns with an average AUC of 0.73 and tended to perform best at coding open and closed questions, complex Weusthoff, Baucom, and Hahlweg (2013) aimed to clarify what forms of information are encoded in f 0 during naturalistic dyadic interactions and to investigate whether f 0 provides a valid form of assessment of emotional arousal.F 0 range is a specific index of f 0 calculated as the difference between an individual's maximum and Language of Interpersonal Interaction 78 minimum f 0 score during a time period of interest.During couple conflict discussions, f 0 range and its concurrent links to other well-established indices of emotional arousal (cardiovascular and endocrine), and other important aspects of couple functioning (self-reported and observed communication behavior) were investigated.
ble.Consider statistics as an analogy.All psychologists (and most other social scientists) learn statistical methods in their training and have varying degrees of facility and expertise in applying data analytic techniques.At the same time, it is very challenging for psychologists to be expert in the many areas of statistical methodology and stay current on new developments.This is largely true for speech signal processing and statistical textmining.Psychologists with interests and skills in quantitative methodologies should certainly be encouraged to learn how to estimate and analyze f 0 or to apply basic topic models to their text data.At the same time, there are significant advantages to collaborating with colleagues who are experts in these areas, as the underlying Language of Interpersonal Interaction 80 math and algorithms (e.g.fast Fourier transform for spectral features from voice, Dirichlet distribution as key component of topic models) are not common to most psychology training.

Table 1
Atkins et al. (2012)rom a Topic Model Applied to Couple-Therapy Transcripts and the 15 Most Likely Words for Each Topic.AdaptedFromAtkins et al. (2012) Note.The bold-face labels are subjective interpretations of the word clusters and are not automatically determined.

Atkins et al. 2014 MISC Code Question
Closed have you, questions, would you, are you, risk, drink, you think, okay so, do you have, heard Question Open what do you, what do, group, do you think, you think, expected, how do you, bar, how do Reflection Simple sounds, it sounds like, mentioned, okay so, sounds like, it sounds, like you, you said Reflection Complex sounds, it sounds like, sounds like, it sounds, sounds like you, like you, so it sounds iCHAMP marijuana, smoke, use, smoking, sleep, rem, smoked, marijuana use, evergreen, month

Table 3
Atkins et al. (2012)s Assigned by the Model for a Sample of Behavioral Codes inAtkins et al. (2012) Does that sound about right.So well did you have any other questions for me or.Open QuestionSo what do you make of that.What do you mean by that.What do you mean by absurd.Simple ReflectionOkay and you said that you used to drink a little bit more last year it sounds like.Yeah you mentioned that and you felt like that kinda.It sounds like it it sounds like you're doing it it sounds like you have um you say you feel better right is that physically and emotionally.Complex Reflection Mm-hmm so you're two birds of a feather it sounds like.Yeah it sounds like you've turned it around.