LEXICAL AND STATISTICAL PROCEDURES FOR IDENTIFICATION OF THEMATIC DOMINANTS OF AN AUTHORIAL TEXT IN MEDIA DISCOURSE

This article presents the experiential data on the dominant thematic characteristics of texts concerning fashion by English speaking columnists Hamish Bowles and Suzy Menkes, obtained with the aid of semantic and statistical analysis (the AntConc concordancer developed by Dr. Laurence Anthony from Waseda University Japan). The fragments of the texts have been examined from the perspective of functional and semantic representations of nominativeness, process, and attributiveness as the basic mental and linguistic categories. Among the dominant thematic units of the nominativeness are: “fashion” (22 %), “person” (20 %), “art and science” (10 %), “time” (4,5 %), “buildings” (4 %), “space and movement” (3 %), “inter-object relations” (3 %), and “plants and animals” (2 %). The category of process comprises the thematic categories of “movement and transfer” (23 %), “creation and modification” (19 %), “mental processes” (18 %) and “cooperation” (6 %), “speech” (5 %), “possession” (5 %) and “similarity” (4 %). The category of attributiveness is represented by adjectives that belong to thematic categories of “evaluation” (40 %), “color and shades” (13 %), “toponymical features” (10 %), “temporal features” (8 %), “size” (6 %), “materials and fabrics” (4 %), “shape” (3 %), “similarity / difference” (2 %) and “restriction” (2 %); by adverbs, which are frequented by circumstantial adverbs, realizing the meaning as suggested by “where”, “when”, and “how” (39 %), degree adverbs, fulfilling the semantic function of comparison (28 %), connective adverbs, responsible for the logical connection between lexical units (17 %), focusing adverbs that implement the function of restriction (9 %), and stance adverbs that reflect the author’s position in a text (7 %). The data may be used to objectify the lexical and thematic features of the thematic space or serve as reference material for conceptual studies of an author’s style.

media sphere and strengthening the role of major specialized media as a source of field-specific events, have contributed to increased attention in research concerning large thematic units beyond literary discourse.Recent research in this field include works on thematic structure of political journalistic discourse by O. Noskova [5], sketches of The New Yorker magazine by N. Petrova [6] and others.Development of computer-assisted methods of text processing has led to a new wave of scientific research, in which the traditional issue of thematic text str ucture r elies on new methodological frameworks, namely corpus technologies and lexical statistics (see: A. Buranova [2], I. Belousov [1]).
Modern linguistics defines "theme of the text" as the notional nucleus of a text; the condensed and generalized contents of a text" [4, p. 17].However, as noted by K. Belousov, "...thematic space of a text is comprised of statistically-determined entities of more or less unity".Such statistical entities are comprised of dominant unities, among which are elementary units.According to K. Belousov, "in this competition, it is the synthesis of the largest unities that may be called the theme of a text" [1, p. 16].
The current research is aimed at the examination of the thematic dominantsstatistically frequent semantic groups of lexical units that ensure thematic integrity of a text through representative functional groups of nominativeness, process, and attributiveness in journalistic articles on fashion.The research relies on quantitative and semantic analysis methods for lexical and semantic categorization of the texts concerning fashion industry by two Englishspeaking journalists Suzy Menkes [Menkes] and Hamish Bowles [Bowles], written for Vogue magazine.
To study the manner in which individual style is manifested in the author's text it is essential to note the single or multiple authorial corpora in order to collect and analyze the statistical data as well as the evidence related to the characteristic use of certain semantic and grammatical categories.Preliminary processing of the text itself must correspond to the goals and objectives of the research, i.e. distribution of grammatical and / or functional categories in single author corpora calls for the so-called part-of-speech tagging (POStagging) involving the assigning of a part-of-speech or function mark to every element in the corpus either manually or by computer-aided procedures.An appropriately prepared corpus allows for the generation of word lists arranged by frequency for all of the words or just those with a certain attribute, thus studying the frequency of specific word meanings and the occurrence of word collocations in the author's texts.Statistically relevant lexical units upon further examination should be viewed as a form of concordance, generally defined as a context in which a word or a collocation has been used.Such a procedure is ensured by special software, a corpus manager or a concordancer which can immediately construct all possible contexts for the entry in question.The benefit of studying lexical units in their original context is the ability to determine their semantic, pragmatic and connotative features.
In current research practices part-of-speech tagging is conducted with the online-instrument Part of Speech Tagging (Standard), available at the Xerox linguistic tools page 1 .The results of processing a corpus are represented by a word list, in which each entry is assigned a "tag" -an identification mark of a hypothesized speech part -and a root form, e.g.+vaux (auxiliary verb), +advcmp (comparative adverb), etc. Word lists produced in such a manner are highly accurate; they do, however, demand manual verification for each entry and disambiguation where necessary.
Current study also relies on the AntConc concor dance 2 developed by Dr. Laurence Anthony from Waseda University Japan.This software uses corpora, marked-up in accordance with the objectives of the study, in "*.txt" format.The functionality of the concordancer facilitates studying lexemes in their original context, search by word part, collocations or certain categories provided the corpus is sufficiently marked-up and the generation of frequency lists by specified criteria.
The preparation of the mini-corpus for our research consisted of POS-tagging of the text data before assigning each element a mark corresponding to the functional categories of nominativeness, process or attributiveness.The function of nominativeness is associated with the naming of objects; the function of process realizes the meaning of "action" either as a manifestation of energy or a physical, emotional or psychological МЕЖКУЛЬТУРНАЯ КОММУНИКАЦИЯ "state"; the function of attributiveness implies the meaning of quality or the character of a process (attribute of action), object (attribute of object) or attribute (attribute of attribute).
For studying the dominant thematic groups within the mentioned functions, all lexical units of the corpus are manually assigned a thematic tag and subsequently categorized into major thematic groups.This mark-up in the mini-corpus permits the generation of frequency lists of functional semantic and thematic categories.According to the statistical data obtained, the relative share of nominativeness units amounted to 23 %, 14 % in the case of process units, and attributiveness units secured 12 % of the total word span.
The comparative analysis of the frequency lists demonstrates that the relative shares of each category under examination are closely interrelated in the texts of each of the referenced journalists (nominativeness units -22 % and 24 %, process units -14 % and 13 %, and attributiveness units -14 % and 11 % in the texts by H. Bowles and S. Menkes, respectively).Such slight variation in the figures (1-2%) in respect to the choice of functional categories in both authors' texts indicates the uniformity of lexical and grammatical units of the thematic space under examination.This provides the relevant and objective results within the frames of current research (see Table ).
The statistical analysis procedure allows the identification of the most frequent, i.e. dominant, thematic groups within the functional category of nominativeness in the examined mini-corpus.In addition to the groups listed, the texts encompass still other categories, such as "quantity", "shape", "sound" etc., each represented by a total share of less than 1 %.These statistically insignificant groups are therefore not included into the report.
Thematic systematization of the units that correspond to the function of process suggested categorizing all verb and verbal forms by their predicative function (i.e., notional and functional verbs) before assigning each element of the notional verbs a certain semantic class.This makes possible the analysis of the thematic organization of this functional group, which in turn revealed the following dominant thematic groups: 1. Movement and transfer of objectsthis category makes up 23 % of the total number of verbs in the examined texts and embodies the concept of "movement" -10 % (move, rotate) and the "transfer of objects" -13 % (bring, carry, spend).
5. Speech -this category comprises acts of speech and the vocal transmission of information, making up 5 % of the total verb count (say, respond).
6. Possession -this thematic group is represented by 5 % of the total number of the verbs (own, possess).
7. Similarity -this category conceptualizes the relation of similarity and representation and makes up 4 % of the total number of the verbs in the mini-corpus (seem, look, represent).
The category of attributiveness is represented by adjectives (9 % of all words in the mini-corpus) and adverbs (3 % of all words).Semantic analysis of the adjectives enabled the identification of the following dominant thematic groups: 1. Evaluation -this thematic group, manifesting the function of the author's evaluation of or attitude towards the described object, accounts for 40 % of the analyzed adjectives (discomfiting, hefty, overwhelming).
4. Focusing adverbs secure 9 % of the total number of adverbs in the texts and represent the function of "restriction".The dominant subcategories in this group are "reinforcement" -5 % (even) and "restriction" -4 % (just, only).
5. Stance adverbs make up 7 % of the adverbs.The function of this thematic category is the reflection of the author's attitude in the text itself.Among the thematic sub-categories of this group are "attitude" -4 % (hopefully, seemingly), "viewpoint" -1 % (playfully) and "possibility" -1 % (perhaps).
The lexico-statistical analysis of the thematic organization of these journalistic texts with the aid of corpus technologies and statistical methods creates the possibility for the identification of the dominant thematic characteristics within a range of functional-and-semantic categories.The data obtained on the basis of authorial corpora related to a certain theme can be used for the objectification of linguistic features of textual unities, though it can be equally instrumental in studies concerning individual authorial characteristics through categorical and thematic systematization of the texts and comparison of the obtained characteristics with the results of a referenced common corpus.