Narrative Language as an Expression of Individual and Group Identity

Scientific Narrative Psychology integrates quantitative methodologies into the study of identity. Its methodology, Narrative Categorical Analysis, and its toolkit, NarrCat, were both originally developed by the Hungarian Narrative Psychology Group. NarrCat is for machine-made transformation of sentences in self-narratives into psychologically relevant, statistically processable narrative categories. The main body of this flexible and comprehensive system is formed by Psycho-Thematic modules, such as Agency, Evaluation, Emotion, Cognition, Spatiality, and Temporality. The Relational Modules include Social References, Semantic Role Labeling (SRL), and Negation. Certain elements can be combined into Hypermodules, such as Psychological Perspective and Spatio-Temporal Perspective, which allow for even more complex, higher level exploration of composite psychological processes. Using up-to-date developments of corpus linguistics and Natural Language Processing (NLP), a unique feature of NarrCat is its capacity of SRL. The structure of NarrCat, as well as the empirical results in group identity research, is discussed.

Another domain of linguistic industry, whose boundaries spread to the territory of personality, clinical, and health psychology, is marked by Pennebaker and his coworkers the Linguistic Inquiry and Word Count (LIWC) software (Chung & Pennebaker, 2007;Niederhoffer & Pennebaker, 2009;Pennebaker, 1990Pennebaker, , 2012Pennebaker, Mehl, & Niederhoffer, 2003;Pennebaker, Paez, &Rime, 1997, andTausczik &Pennebaker, 2010). The underlying idea of LIWC is that the dozens of its-partly thematic and partly grammaticalword categories can detect, measure, and statistically compare psychologically relevant phenomena in bulky texts.
For a review of this comprehensive paradigm and the related abundant research in the past decades, see Tausczik and Pennebaker (2010). In his recent book, The Secret Life of Pronouns: What Our Words Say About Us (2011), Pennebaker follows the idea of "ignoring the content, celebrating the style"-which is in good congruency with the LCM. However, whereas LCM, LIB, LEB, SAB, and the other models, based either on lexical or grammatical choice, relate to message modulation for perception in communication, Pennebaker's lexical and grammatical analysis inquires how these choices express internal states of the communicators.
A remarkable novelty of scientific narrative psychology comes from the recognition of correspondences between narrative organization and psychological organization, namely, from the fact that narrative features, for example, the characters' functions, the temporal characteristics of the story, or the speakers' perspectives provide information about the features and conditions of self-representations. In this sense, scientific narrative psychology exploits achievements of narratology (e.g., Barthes, 1977;Culler, 2001;Eco, 1994;Genette, 1980). However, whereas narratology studies effects of narrative composition on readers' understanding and experience, scientific narrative psychology is directed to how narrative composition expresses inner states of the narrator.
Another novelty is that the term narrative composition here also refers to the interpersonal and intergroup relations. It means that not only the psychological content of the words, for example, the content of a particular emotion or deed is considered, but also the relations: who feels or acts this content toward whom? By extending the narrative analysis to interpersonal and intergroup relations, results gained with this analysis can be corroborated with empirically founded theories of personal and social identity. Furthermore, it provides opportunity for empirical extension and testing dynamic psychological theories in personality and social psychology.
In the following sections, Narrative Categorical Content Analysis (NarrCat), as a means of empirical conceptualization and operationalization of the scientific narrative psyhology, as well as some results obtained with this methodology is presented.

Conceptualization of Content Analysis in NarrCat
NarrCat is based on the psychologically relevant markers of narrative categories and narrative composition. The primary focus of NarrCat is not on psychological correlates of words, nor on function words (as contrasted to contentual words), or on grammatical markers (e.g., past tense). Instead, it explores the evaluational, emotional, and cognitive processes of the self and the other, and the ingroup and the outgroup, furthermore, to explore more complex principles of narrative composition, such as spatio-temporal perspective and psychological (inside) versus observer (outside) perspectives.
As with other psychological content analysis systems, such as LIWC (Pennebaker, Booth, & Francis, 2007, or RID (Martindale, 1990, NarrCat also has dictionaries. However, because of the complex morphology of the Hungarian language and the need for disambiguation, lexicons are endowed with local grammars that perform the task of disambiguation and enable further grammatical analysis. NarrCat is able to handle two further language-processing tasks-anaphora resolution and Semantic Role Labeling (SRL). Anaphora resolution (replacing personal pronouns with relevant proper names with the aim to identify the participants of the narratives) is solved by an external linguistic parser. SRL partly uses external linguistic parsers, and partly involved as support modules in the NarrCat system.
The system yields quantitative results about who or which group acts, evaluates, has emotions thinks something as to somebody or another group. Thus, the output depicts the psychological composition of interpersonal and intergroup relations that are relevant to the construction of identity.

Technological Background
The software that presently serves for content analysis in the framework of scientific narrative psychology is NooJ, a multilingual linguistic development environment (Silberztein, 2003, http://www.nooj4nlp.net/pages/resources.html). The international NooJ community currently includes 19 languages, several among them with non-Latin script. The basis of usability of NooJ for the analysis of Hungarian language texts is the Hungarian National Corpus that currently contains 187.6 million words (http://corpus.nytud.hu/mnsz/ index_eng.html; Váradi, 2002).

Overall Structure
NarrCat is a flexible and comprehensive methodological toolkit for machine-made transformation of sentences in self-narratives into psychologically relevant, statistically processable narrative categories.
Two examples of the process are as follows: "Mom will be proud of me" = "Other agent's abstract positive emotion toward Self as recipient in the future" "The Hungarians were feared of the Mongols" = "Ingroup agent's basic negative emotion toward outgroup as recipient in the past" The structure of NarrCat is shown in Figure 1. The first level of composite modularity is the "workshop": the linguistically annotated Dictionaries (DICs) serve as input to the Local Grammars (LGs). The LGs contain disambiguity rules, exclusions, textual variables, and so on. Partly through Submodules, the DICs and LGs give input to two groups of higher level entities: Psycho-Thematic Modules and Relational Modules. Forming the main body of the NarrCat, the Psycho-Thematic Modules include Agency, Evaluation, Emotion, Cognition, Spatiality, and Temporality. The Relational Modules include Social References, SRL, and Negation.
The higher levels allow for category building to different extent of complexity. An example of a simple analysis is to assess and compare the frequency of negative emotionality in a textual corpus. A still simple, but multimodular analysis is, for example, the assessing and comparing the number of negative emotionality words and expressions, self-references, negation, which are pathognomic of depression. Certain elements can be combined into Hypermodules, such as Psychological Perspective and Spatio-Temporal Perspective, which allow for even more complex, higher level exploration of composite psychological processes. Hypermodules get input from the lower level modules in a comprehensive manner.
Prior to going into the details of the system, please note the following: 1. NarrCat is an open system that is extendable with new elements in all levels. 2. Dictionaries and Local Grammars are mostly general, that is, applicable to the analysis of a large variety of individual and group narratives in social and clinical psychology in their present form. They are, however, open for project-specific extensions as well. 3. The dictionaries were compiled from the ten thousand most frequent verbs, adjectives, nouns, and adverbs in Hungarian language, provided to us by the Research Group for Language Technology of the Research Institute for Linguistics of the Hungarian Academy of Sciences. The dictionary was classified by five judges, unless otherwise classified below.

The Psycho-Thematic Modules
As illustrated in Figure 2, NarrCat at present has six Psycho-Thematic Modules, such as Agency, Evaluation, Emotion, Cognition, Spatiality, and Temporality. The Psycho-Thematic Modules of NarrCat were comprehensively used in a series of empirical studies. Characteristics of Hungarian national identity in a series of historical text corpora were investigated, including history textbooks, folk history stories, historical novels, and newspaper texts (Csertő & László, 2011, in press;Fülöp et al., 2012;László, Ehmann, & Imre, 2002;László & Vincze, 2004;Vincze et al., 2013;. The most outstanding results are referred to below the respective Psycho-Thematic Modules.

The Agency Psycho-Thematic Module
The structure of the Agency Psycho-Thematic Module is shown in Figure 3. The module is composed of the Activity  and the Intentionality  submodules. Both submodules are of dimensional nature.

The Activity Submodule
The Activity Submodule was built from the Activity and Passivity Dictionaries and Local Grammars.
The Activity Submodule is able to detect linguistic structures of activity-passivity in text, and thereby enables quantitative analysis of each character's activity. The Activity-Passivity results are expressed by a ratio of the two numbers. The higher the activity-passivity ratio, or in other word, the more active idioms are used at the expense of the passive ones, the higher extent is the character presented as efficient actor in the narrated events and the more is she having effect on her environment. Moreover, the more passive expressions the narrative uses at the expense of the passive ones for a character, the more it emphasizes her passivity, and her incapacity in action .

The Intentionality Submodule
The Intentionality Submodule was built from the Intentionality and Constraint Dictionaries and Local Grammars.
Intentionality can be expressed not only by intentional auxiliary verbs such as want, will, wish, and so on, but also by intentional nouns (goal, plan), adverbs (goal-mindedly), adjectives (intentional), and postpositions, as well as by some cases of conditional mood and subordinate sentences. Constraint occurs when the action is performed not by the actor's own will, but on the effect of external or internal pressure.
The Intentionality-Constraint results are expressed by a ratio of the two numbers. The higher this ratio, that is, the more intentional words and expressions are used by the narrator, the more the agent is goal-minded and efficient. Lower ratios indicate more constraint in the description of the event, that is, the event takes place not as much according to the agent's will, but on external pressure.
The Agency result is the overall sum of the Activity-Passivity and the Intention-Constraint ratios. High Agency score indicates an active, goal-minded, effective agent, while low Agency score indicates a passive, externally controlled agent .
Results concerning ingroup−outgroup agency showed that agency of the Hungarian ingroup was much lower than the agency of outgroups. The pattern of results was very similar in the history school textbooks and folk-narratives. Folknarratives tended to depict ingroups and outgroups as having more agency than textbooks did, except the Hungarian ingroup in the negative events, where the agency level was extremely low .

The Evaluation Psycho-Thematic Module
The structure of the Evaluation Psycho-Thematic Module (Csertő & László, 2011) is shown in

CONSTRAINT
LGs and DICs

ACTIVITY
LGs and DICs

INTENTIONALITY
LGs and DICs The evaluation module marks the keywords conveying evaluative content with annotation tags according to word class and valence which keywords are included in several different dictionaries within the module. Evaluative keywords may be adjectives, verbs, nouns, or adverbs sorted by word class.
At the moment, the Evaluation Module is capable for identifying evaluative keywords in any inflected forms in texts; furthermore, it identifies verbs and verbal adverbs with separated prefixes, and then it annotates the identified structures with output tags corresponding to their valence.
The Module was used in the study of interpersonal and intergroup evaluation processes. The results were similar to those obtained with Agency: Ingroups were evaluated much higher than outgroups in positive and negative events, but there was a statistically significant interaction with event valence: Hungarians were evaluated even more positively in positive events, and outgroups were evaluated even lower in negative events. Comparing folk stories and textbooks, these effects were prevalent in folk stories even more markedly (Csertő & László, 2011).

The Emotionality Psycho-Thematic Module
The structure of the Emotion Psycho-Thematic Module (Fülöp et al., 2012) is shown in Figure 5. The module is composed of the Emotional Valence, the Emotional Humanity, and the Moral Emotions Submodules.
The dictionary of the Emotionality Psycho-Thematic Module was compiled from the Hungarian monolingual explanatory dictionary by two independent coders. The list consists of 1,100 words. Contextual disambiguation and the identification of conjugated forms were solved by Local Grammars.

The Emotional Valence Submodule
Simplest and most informative, this submodule ranks emotions into Positive and Negative groups, for example, "joy," "contentedness," "hope," versus "suffering," "disappointment," or "uneasiness." Valence analysis may provide psychological information about the characters' optimism, openness to the World, their relationships and self-assessment, and so on.

The Emotional Humanity Submodule
This submodule groups Primary ("anger," "joy," "sadness") and Secondary ("proudness," "guilt," "honor") Emotions. This submodule is well applicable to the investigation of the Infrahumanization Paradigm that is a sensitive indicator of the devaluation of the outgroup by depriving them of the capacity to have secondary emotions (Leyens et al., 2000).

The Moral Emotions Submodule
Considering the fact that the representation of moral emotions has an important role in identity states and development (László, 2008), the Moral Emotions Submodule is composed of Self-Critical and Other-Critical emotions, such as "shame," "guilt" versus "despise," "disgust," and so on (see Rozin, Lowery, Imada, & Haidt, 1999). Furthermore, this submodule may show whether the focus of emotional attention is the self and the ingroup or the other person and the outgroup (see Harth, Kessler, & Leach, 2008).
The Emotionality Module is highly flexible: New insights into the psychology of emotions may require new grouping of dictionaries. The module was applied in the analysis of Historical Trajectory Emotions in the scope of national identity studies. The results showed an emotional pattern of fear, hope, enthusiasm, sadness, and disappointment to prevail in the self-representation of the nation in history books and folk-narratives. These emotions were significantly more frequently related to Hungarians than to other nations; therefore, this configuration of emotions was labeled as historical trajectory emotions. Overall, outgroups were mostly endowed with hostile, negative emotions. These results suggest and that the national self-representation is organized around mistrust, bitterness, and dissatisfaction due to unfulfilled aspirations (Fülöp et al., 2012).

The Cognition Psycho-Thematic Module
The Cognition Psycho-Thematic Module (Vincze, Tóth, & László, 2007) has no submodules. It is composed of mental verbs ("generalize," "ponder"), nouns ("thought," "decision," "idea"), as well as mental idioms that do not contain the mental words, such as "get it all wrong," "get over sg," "get sg straight," and so on. All words and idioms that

EMOTIONAL REACTIONS
LGs and DICs

EVALUATIVE INTERPRETATIONS
LGs and DICs

ATTRIBUTIONS
LGs and DICs referred to mental processes were considered as cognitive ones. The module is shown in Figure 6. The Cognitive Module can be applied in the research of mentalization and cognitive empathy. Representing character mental processes, what she thinks, believes, dreams, and so on, facilitates cognitive empathy toward the character. The strategic use of mental phrases, enhancing the psychological perspective of a character over others has consequences on meaning formation. Ascribing psychological perspective increases the likelihood for identification and decreases the responsibility for negative action performed by the character. However, this latter finding happened only when the character's cognitive considerations were positive (Vincze et al., 2013).

The Spatiality Psycho-Thematic Module
The structure of the Spatiality Psycho-Thematic Module is shown in Figure 7. The Spatial Deictic LGs and Dictionaries list words and demonstrative pronouns whose meaning contains the proximal versus distant components, such as here versus there, and this versus that . The Spatial Interpersonal Relational Movement Submodule consists of two LGs and DICs: Social Approaching (e.g., "moving home") and Social Retreating ("leaving"; Pohárnok et al., 2007).
The psychological relevance of social approaching and retreating was investigated in self-narratives of borderline patients. The results suggested that persons with inadequate emotion regulatory capacities uttered significantly more words and expressions in the Social Approaching and the Social Retreating categories (Pohárnok et al., 2007).

The Temporality Psycho-Thematic Module
The Temporality Psycho-Thematic Module is composed of Contentual and Functional Submodules . The module is shown in Figure 8.
The Contentual Submodule consists of Thematic and Sequential dictionaries and local grammars, for example, the days of the week, the name of months, life periods and holidays; and the linguistic markers of sequential, chronological flow of time (dates, then, etc.), respectively. The Contentual Submodule also serves as elements of certain Functional Submodules (e.g., "all the day" = Duration).
The Temporality Module was used for the research of subjective time experience and the temporal organization of traumatic narratives. In the first set of investigations, the

MENTAL NOUNS
LGs and DICs

MENTAL VERBS
LGs and DICs

MENTAL IDIOMS
LGs and DICs

OTHER-CRITICAL
LGs and DICs

SELF-CRITICAL
LGs and DICs

POSITIVE
LGs and DICs

NEGATIVE
LGs and DICs

SECONDARY
LGs and DICs results showed that impulsive people (low scorers in the Big Five Questionnaire (BFQ) Impulse Control Factor) used significantly more adverbs and adverbial phrases for the thematizations of Start and Future . In the second set of research, the temporal structure of traumatic self-narratives were plotted along deviations from the simple past sentenced timeline either into the past perfect or third type conditional sentences or into present or future tense sentences. The results suggested that elaborated traumas appear with a pattern of relatively flat (simple past tense) timeline with past perfect and third conditional tense as markers of rumination, whereas the frequent use of present and future tense verbs may indicate that the effect of the trauma is still active (Ehmann & Garami, 2010).

The "Blank" Psycho-Thematic Module(s)
The "Blank" Module(s) are listed among the Psycho-Thematic Modules to illustrate the flexibility of the NarrCat System; it can be supplemented with further elements in the future.

The Relational Module Group
NarrCat includes three Relational Modules: Social References, Social Role Labeling, and Negation (Figure 1.)

Social References Module
The two submodules are the Interpersonal Reference and the Group Reference (Hargitai et al., 2007). The module is shown in Figure 9.
The grammars of the Interpersonal Reference Submodule find the singular and plural of first, second, and third person pronouns and conjugated verbs. This module was composed not on word frequency basis, but on corpus linguistic annotations basis. For example, the "V+1+sg" grammar finds all Verbs conjugated in first person singular, for example, Hungarian conjugated verb "sétálok' = I am walking," and so on.
The Group Reference Submodule is more composite in nature. It is truly relational: "Hungarians" or "our king" may be ingroup or outgroup, depending on the context. While the Interpersonal Reference Submodule is an "instant kit" that can be used in any texts, the Group Reference Submodule is based on project-specific dictionaries and local grammars.
The psychological content of Self-Reference was used for the investigation of high or low self-experience in depression (Hargitai et al., 2007). The ratio of Self-and We Reference words and expressions was used as elements of a composite marker of Team Spirit in diaries of the Hungarian crew in a space-analogue experiment at the Mars Desert Research Station .
The Social References Module also serves as an input of the SRL Module.

SRL Module
Semantic roles (SRs)-or thematic roles-are age-old linguistic constructs (Gildea & Jurafsky, 2002). SRL is an active research area in the field of Natural Language Processing (NLP); the aim is to find SR words and expressions automatically (Márquez, Carreras, Litkowsky, & Stevenson, 2008). In automatic recognition, the number of SRs targeted is typically around a dozen, for example, Agent, Patient, Theme, Experiencer, Beneficiary, Instrument, Location, Source, Goal, Manner, and so on. Foley and Van Valin (1984) proposed two macro-roles-Actor and Undergoer. Pointing at the fuzzy and overlapping nature of SRs, Dowty (1991) suggested two Proto-roles, the Proto-Agent and the Proto-Patient.
SRL is important for Scientific Narrative Psychology because it reveals the "agent" or "actor" or "owner" of agency, evaluation, emotionality, and cognition-that is who acts, who evaluates, who feels emotions, who thinks something in individual and group narratives. Similarly, it also identifies the "object," the "target," the "patient," the "undergoer," the "sufferer" or the "beneficiary" of the agent's acts, intentions, emotions, and thoughts.

SPATIAL DEICTIC WORDS AND PRONOUNS
LGs and DICs

SOCIAL APPROACHING
LGs and DICs

SOCIAL RETREATING
LGs and DICs Figure 7. The Spatiality Psycho-Thematic Module of NarrCat.

SEQUENTIAL
LGs and DICs

START-END
LGs and DICs

THEMATIC
LGs and DICs

SLOW-QUICK
LGs and DICs

DURATION
LGs and DICs

CYCLICITY
LGs and DICs

ALLNESS
LGs and DICs Borrowing the concept of "Proto-roles" from Dowty (1991), NarrCat uses two general terms: "Agent" and "Recipient." As illustrated in Figure 10, the "Agent" and the "Recipient" may be either an individual or a group.

Negation Module
The Negation Module is able to find negative particles, pronouns, adverbs, postpositions and privatives in textual corpora (see Figure 11).
From psychodynamical viewpoint, negation in self-and group narratives was studied in the context of adaptation to healthy human environment and moral standards, and is to indicate inclination to devaluate the world, to be apt to destruction and self-destruction (Hargitai et al., 2007). From a social cognitive viewpoint, negation is an indicator of stereotyping (Beukeboom et al., 2010).

Hypermodules of NarrCat
As it was noted above, the NarrCat system has a unique feature of higher level exploration of composite psychological processes related to individual and group identity. The technical basis of this feature is that dictionaries, local grammars, submodules, and modules can be combined in a comprehensive and flexible manner. At present, the two hypermodules are Psychological Perspective and Spatio-Temporal Perspective.

Psychological Perspective Hypermodule
The Psychological Perspective Hypermodule summarizes the matches found by the Cognitive and Emotionality Modules, and by the Intentionality Submodule of the Agency Module. This module identifies words and expressions in textual corpora that describe the internal mental states of characters in individual and group narratives. The module is illustrated in Figure 12.
The use of perspective may play a role in the study of psychological issues, such as mentalization, empathy, or perspective-taking in the reception of literary or historical texts Vincze et al., 2013).

Spatio-Temporal Perspective Hypermodule
Spatio-temporal perspective refers to a relation between the content of a narrative and a position from where the content is narrated. Three forms of the spatio-temporal perspective have been described: Retrospective, Experiencing, and Metanarrative . The Spatio-Temporal

INDIVIDUAL
LGs and DICs

GROUP
LGs and DICs

INDIVIDUAL
LGs and DICs

GROUP
LGs and DICs

PRONOUNS
LGs and DICs

WORDS
LGs and DICs

ADVERBS
LGs and DICs

PRIVATIVES
LGs and DICs Figure 11. The Negation Relational Module of NarrCat.

RelaƟonal Module 1st PERSON
LGs and DICs

2nd PERSON
LGs and DICs

3rd PERSON
LGs and DICs

INGROUP
LGs and DICs

OUTGROUP
LGs and DICs Perspective Hypermodule identifies the form of the spatiotemporal perspective taken by the narrator. The components of the hypermodule are depicted in Figure 13.
The psychological relevance of spatio-temporal perspective was explored by studies on coping with threatened social identities and on emotion regulation. A series of studies were conducted, and the results suggested that the use of the Retrospective form reflects the progress of the elaboration process, while the use of the Experiencing and even more the Metanarrative forms indicate difficulties in coping with threatened identities, for example, in homosexuals or in women with unsuccessful in vivo fertilization treatment. Congruent with these findings, another subset of studies in this context showed that even lay people were able to perceive that the more expressed use of the Retrospective form reflects better coping in self-narratives by subjects with threatened identity (Pólya et al., 2005).

Summary and Conclusion
In the introduction we celebrated the return of language into social psychology, with special focus on the LCM model and its kindred perspectives. We showed that linguistic investigation in social psychology is far from being limited to the level of words or phrases: As Turnbull (1994) suggested, thematic structure of sentences may have psychological implications as well. This stream of research provided ample evidence that lexical and grammatic choice mediates social perception.
The systematic research of Pennebaker and his coworkers is directed to psychological correlates of linguistic forms (Tausczik & Pennebaker, 2010). They have identified several lexical and grammatical word categories that express underlying psychological constructs. Moreover, they have automatized their content analytic devices thereby introducing language technology into psychological research.
Scientific narrative psychology is an attempt to integrate achievements of the quantitative methodologies into the study of identity. Our studies provided evidence that narrative composition expresses psychological processes of identity construction, and thereby introduced the compositional level into the psychological study of language beyond the lexical and grammatical levels. These studies also showed that narrative composition can linguistically be operationalized (see László, 2008;László, 2012;László & Ehmann, 2013, for summary). For measuring narrative categories and narrative composition, a new analytic device has been developed: NarrCat, which exploits the recent achievements of language technology.
A unique feature of NarrCat is its capacity of SRL. This function yields quantitative results about who or which group acts, evaluates, has emotions, or thinks something as to somebody or another group. Thus, the output depicts the narrative (psychological) composition of interpersonal and intergroup relations that are relevant to the construction of identity. A fully functional SRL analyzer requires another language-processing component: anaphora resolution. Partly using external linguistic parsers, and partly involved as project-specific support modules in the NarrCat system, SRL and anaphora resolution in many languages are cutting-edge issues of present-day NLP.

Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) disclosed receipt of the following financial support for the research and/or authorship of this article: The authors are

MODAL EXPRESSION
LGs and DICs

VERB TENSE
LGs and DICs

SPATIAL DEICTIC WORDS AND PRONOUNS
LGs and DICs
grateful to the Hungarian National Research Foundation for the support by Grant 81366 to the first author.