Empirical Multimodality Research

: Multimodality research has always shown a strong reliance on data. However, the field has primarily developed around more exploratory, descriptive, and interpretative work on smaller data sets — as suggested by results we present from a meta-study of contributions to three multimodality-close international journals ( Social Semiotics , Visual Communication , Multimodal Communication ). Framed by a discussion of the qualitative-quantitative dichotomy and a deliberately broad working definition of empirical , we argue that it is not sample size or quantitative methods alone that support a more solid empirical grounding of multimodality research, but rather an explicit orientation to just how theory and analysis make contact with data. To this end, we propose five quality criteria of empirical practice, that is, completing the empirical feedback loop: from theory to data and back, ensuring objectivity, reliability, and validity in research, and acknowledging the inherent tentativeness of results. We thereby seek to chart paths for an appropriate and productive application of various empirical methods to novel (and supposedly familiar) forms of meaning-making in order to further strengthen the development of theory and methods in multimodality, and to encourage an even more intense exchange among the diverse communities of multimodalists.


Introduction and the Aims of this Volume
As a research endeavor first and foremost borne out of the practical observation that all meaning making naturally involves a multitude of forms of expression, multimodality research has always been driven by data.When we look more at the kinds of explicitly 'empirical' work that have preoccupied multimodalists of many stripes over the past 25 years, however, it is fair to say that multimodality research locates itself mainly towards the smaller-scale and more qualitative poles of the empirical continuum.In many respects, this is understandable: driven by the challenges of engaging with new kinds of increasingly complex research objects, the field has developed around more exploratory, descriptive, and interpretative work undertaken with respect to smaller sets of data.
Nevertheless, as research objects continue to diversify, the interest of neighboring disciplines increases, and the field shows signs of becoming a stand-alone discipline in its own right (see Wildfeuer et al. 2019), the need for solid empirical grounding is also becoming ever more apparent.In fact, diverse scholars in multimodality have been pointing for some time, and often quite independently of one another, to the usefulness of 'large n' empirical investigations (see, e.g., Stöckl 1997;Bateman et al. 2004;Gu 2006;Carter & Adolphs 2008;Nakano & Rehm 2009;Bednarek 2015;Hiippala 2015;Pederson & Cohn 2016;Bezemer & Cowan 2021).Furthermore, scholars are increasingly stressing the shortcomings of exclusively broad-brushed orientations (e.g., Jewitt 2017;Kohrs 2018).Multimodality research as a whole thus seems poised at a particular point of development where exploratory studies can beneficially be complemented by further kinds of empirical work bringing the potential of a productive interaction across the entire empirical continuum into view.
This development requires careful consideration, however, and a certain 'hesitation to scale-up' is still common.There are several reasons for this, ranging on the one hand from work indeed being so experimental in nature that larger-scale studies might well be premature, to on the other hand, lack of knowledge and experience concerning just how such larger-scale studies might be conceived and conducted.In many institutions where approaches to multimodality are taught, methodologies for larger-scale studies are not prominent on the curriculum.Moreover, addressing this concern is not just a question of applying well-established techniques from elsewhere: there are also significant theoretical issues revolving around just how empirical methods can be productively applied to novel forms of meaning-making.It is often by no means clear how best to proceed and further methodological guidance -or: "greater rigor and investment of effort in developing robust conceptual frameworks and reliable methods" (Thomas 2019: 86) -is urgently required.
This challenging situation constitutes the overall context for the current volume.As larger multimodal corpora become available and computer-based tools are developed to assist the processing of greater quantities of multimodal data, scholars, often collaborating in teams across national, disciplinary, and methodological borders, increasingly seek a more solid empirical grounding for multimodality research.Some of these endeavors are documented in the contributions to this volume, where our goal has been to demonstrate a range of engagements with multimodality research that exhibits a strong empirical orientation.The various chapters in the book consequently include both example analyses where this is done and some of the methodological and theoretical concerns that such work raises.In addition, the need to make approaches to multimodally complex artifacts and performances more receptive to empirical grounding requires both more foun-dational work on methods and the adaption of relevant empirical methods from other research areas as well.Several chapters of the book address these concerns specifically.
One of the essential preconditions for making appropriate and productive contact with data is to complete the empirical 'feedback loop' from theory to data and back to theory.Without this loop, research in multimodality will continue to lack the solid empirical grounding that now appears necessary for progress.Establishing how this can be done in a methodologically appropriate fashion is, in our view, just as important as increasing sample sizes.It also legitimizes smaller-scale work whenever attention is paid to core tenets of conducting empirical research -that is, it is not sample size alone, or quantitative methods, that make the difference, but rather an explicit orientation to just how theory and analysis engage with data.Any turn to the empirical in multimodality research is therefore also in need of a more critical and thorough reflection on how this connecting of theory with data and vice versa is to be achieved.This is a dimension of empirical investigation that has long been taken for granted and is only recently beginning to receive the attention it requires within multimodality research.Deepening this discussion is then a main aim of this book, reflecting carefully on quality criteria for conducting empirical research with data-sensitive/-responsive concepts and frameworks suitable for the review and renewal of research practices and hypotheses.
For the purposes of framing the contributions collected in the volume and of positioning the view of empirical multimodality research we envisage more productively, we organize the discussion around the following three factors: -Methods.The volume presents methods for investigating a broad variety of multimodal artefacts (corporate logos, advertisements, news texts, posters, films, video games) and performances (e.g., political TV interviews, face-toface teaching, oral narrative), in which theoretical frameworks in multimodality research of rather different kinds are carefully applied to -typicallylarger data sets.-Evaluations.The case studies presented also support critical evaluations of existing theoretical and methodological frameworks.The book consequently includes several contributions that deal more exclusively with questions of moving from 'theory to empirical inroads' and which thereby evaluate current practices of applying theory to data.-Implications.The contributions also reflect on the implications of their findings -be they of theoretical, methodological or analytical nature -and make concrete suggestions for the adaption and expansion of existing practices and the design of future research projects with an empirical slant.
We see these perspectives as offering particular insights on the process of conducting empirical research.However, before proceeding to the contributions themselves, we provide in this introduction a brief overarching characterization both of the nature of empirical research and its current state within the field of multimodality -we mentioned an 'empirical continuum' above, but just what does this entail?Providing more detail here will help anchor the various directions which the contributions to the volume illustrate against the backdrop of a growing orientation to empirical work in multimodality more broadly.
First, then, we address the traditional dichotomy of qualitative versus quantitative and relate it to the notion of empirical.This is important as a preliminary stage in bringing together formerly rather disjoint sets of methods adopted for multimodality research.Second, we ask just how empirical the field of multimodality has in fact already become.To answer this question, we present results generated in a quantitative study of publication output of three prominent international journals in the field: Social Semiotics, Visual Communication, and Multimodal Communication.Third, on the basis of our survey, we argue that promoting an 'empirical turn' in multimodality research is now justifiable and beneficial and, to encourage such work, we propose a list of five quality criteria that can be drawn on to shape empirical practice.Finally, we preview the contributions to the book from the perspectives outlined and draw out some broader implications for our understanding and practice of empirical multimodality research.

Notion of Empirical
Historical Roots.A look at some of the earliest endeavors in research reveals how scholars have always naturally conducted work that can be considered 'empirical' to generate knowledge.A more solid empirical grounding of multimodality research is thus by no means a daring new move, but rather a reorientation towards the close connection between a research interest in real-world phenomena and their in-depth study to generate answers.This realization also points to a common denominator in qualitative and quantitative approaches, whose dichotomous relationship results from historical convention rather than some inherent difference in nature.
To show this it is revealing to consider historical precedents.Up until the end of the Middle Ages, for example, a world view had been cultivated that was deeply entrenched by superstitious belief.The Scientific Revolution of the 16 th century then opened up an eagerness to explore the natural world by means of controlled experiments and the invention of tools and instruments to pursue them (Kevles 1992: 12).This development saw the natural world as separate from the perceiving individual -a principal tenet of an epistemology glossed as positivism by Auguste Comte in the 19 th century.In this view, an objective truth is made accessible through observation and testing of cause-effect relations, which in turn prepares the ground for unbiased explanation, generalization, and the establishment of universal laws (Bhattacharya 2008: n.pag.;Sousa 2014: 211; see also Kirk & Miller 1986: 14).
The Early Modern Period then witnessed the strengthening of alternative approaches to knowledge generation, such as hermeneutics, which is practiced by interpreting the written and spoken word, originally primarily biblical texts.A growing emphasis on intuitive processes of understanding lead Dilthey to postulate around 1900 what he perceived to be the main difference between the natural sciences and the humanities: while the former explain the world, the latter seek to understand it (Bühler 2003: 4).This epistemological development continued throughout the 20 th century, perpetuating (supposedly) different practices of research and debate, as well as different attitudes and perceptions of one another (cf.Yanow & Schwartz-Shea 2015: xiii).
Research Paradigms.Research in the natural sciences is consequently interested in generating objective facts and revealing universal regularities.Typically, by convention, scholars investigate larger quantities of data that have been representatively sampled (to account for natural variation), and pursue approaches that involve controlled measurement procedures; their methodological orientation is thus quantitative.Equally interested in revealing regularities and patterns, the related branch of (empirical) sociology draws on the possibility of quantification as well, but pursues essentially qualitative approaches, and thus crosses the (supposed) border between quantitative and qualitative research (see Kromrey 2002).By tradition, the humanities in contrast show a strong leaning towards using introspection, interpretation, and subjective perspectives to achieve an in-depth understanding of the 'nature' or quality of things (Kirk & Miller 1986: 9).
Unfortunately, these traditions and conventions have tended to push humanistic scholarship into the position of a clear counter-player to natural scientific research, resulting in misconceptions such as "[i]f statistics and 'large n' studies [. . . ] were to be understood as quantitative analysis, then 'small n' studies using nonstatistical methods [. . .] must be 'qualitative' analysis" (Yanow & Schwartz-Shea 2015: xiii; see also Riesenhuber 2009: 7).Such assumptions about sample sizes are inherently problematic because they suggest that qualitative humanistic research must conform to just those validity standards adopted for 'objective' quantitative natural science research for that scholarship to be considered worthwhile (cf.Bollnow 1974: 1;Sousa 2014: 212).In a similar fashion, some humanities scholars come to the counter-view that "the search for patterns, regularities, or laws has no place in the Humanities" (van Peer et al. 2007: 7).
Since the mid-20 th century, such 'two-fold taxonomies' (Yanow & Schwartz-Shea 2015: xiii) of quantitative vs. qualitative and natural science vs. humanities have been subject to increasing critique.This is not least documented in C.P. Snow's well-known Rede lecture of 1959, in which he describes two separate 'cultures' of scholars, the natural scientists and the 'literary intellectuals' (Snow 2001(Snow [1959]]: 2-4).While being "comparable in intelligence", Snow had noticed that they had "almost ceased to communicate at all" and urged his audience to understand the "practical and intellectual and creative loss" because "we are letting some of our best chances [for discovery, JP/JB/JW] go by default" (Snow 2001(Snow [1959]: 2-4, 11, 16)]: 2-4, 11, 16).More recently, a growing appreciation has developed of the productive synergies that more integrative and collaborative approaches can support (Brannen 1992).In this context, scholars of 'both trades' engage in extensive discussions about the fallacies of their own paradigms.In particular, quantitative research, and the positivist or reductionist paradigms it has traditionally been associated with (Bollnow 1974: 2), is now commonly described as being equally based on theorization, and its questions and interpretations as socially derived (Bhattacharya 2008).Qualitative research, in turn, has been accused of relying too much on intuition and speculative reasoning (Sampson 2002: 2;van Peer et al. 2007: 7 both in reference to linguistics), and of cultivating a habit of generalizing their arguments despite a noticeable 'gap in evidence' (Piper 2016: 5-6, in reference to cultural studies).
Many of these broader social currents play out in microcosm in linguistics, which itself also has a considerable tradition in conducting empirical research dating back to the 18 th century.Sub-fields such as computational and corpus linguistics, applied linguistics, phonetics, psycho-or sociolinguistics (and much interdisciplinary work) have long cultivated the use of empirical methods, with a pronounced emphasis on quantitative methods and experimentation (cf. Wasow & Arnold 2005: 1485).While scholars have acknowledged that linguistics "straddles the humanities/science borderline" (Sampson 2005: 17) and that there are valid areas of linguistics where the empirical quantitative methods do not apply (Sampson 2005: 17), they have also urged that there is still much room for further productive development bridging these perspectives (Sampson 2002: 1).
Consequently, scholars in linguistics and beyond have been reflecting upon what original contributions to knowledge generation they might make.Researchers with a leaning towards qualitative approaches, for instance, have begun to acknowledge and embrace the particular contributions made possible by their viewpoints.Arguments are made for their adequacy in generating valid scientific knowledge (Sousa 2014: 212) even if this challenges traditional notions of 'truth' and 'evidence' (Bhattacharya 2008).Although there is still a "need for clear criteria governing . . .monitoring, rigour, and quality assessment" (Sousa 2014: 212), a more explicit legitimization of qualitative approaches grants them a secure position in more complex research procedures, for instance when developing hypotheses, which are then made testable through quantitative research (Riesenhuber 2009: 6, 7).
Empirical -A First Approximation.These considerations make any subscription to particular research paradigms a matter of more/less rather than either/or.Indeed, a raised awareness of methodological diversity within both paradigms (Benoit & Holbert 2008: 622) blurs the traditional distinction between quantitative and qualitative forms of inquiry and has revealed a core interest that both the humanities and the natural sciences share: conducting empirical research.
In the widest sense, then, empirical research simply means seeking to answer research questions about real-world phenomena by means of studying intersubjectively observable data (be they real/authentic, manipulated in experimental settings, or even intuition-deduced), whose results are utilized to reassess previous knowledge structures and associated hypotheses (cf.Bateman & Hiippala, this volume).The question of how to accomplish the move from theoretical/hypothetical assumptions to data description and back, therefore, lies at the heart of empirical research, and so needs to be considered and reported on thoroughly.Contrary to the prevalent but misleading assumption introduced above that has tended to label methods according to the size of data samples, this also means that if the connection between theory and data is sound, the label empirical is not solely reserved for quantitative 'large n' studies and can apply equally to 'more qualitative' work, even when smaller in scale (cf.Benoit & Holbert 2008: 615).
This notion of empirical research will be expanded on in Section 4.1 below specifically for the multimodality case.This will allow us to set up a view of empirical multimodality research that is part of, and contributes to, other branches and directions of multimodality rather than being a disjoint 'school of thought'.This then furthers our main aims of encouraging productive dialogue and exchange between multimodality approaches that are small-scale, qualitative, and perhaps exploratory on the one hand, and approaches that are larger-scale with quantitative support on the other.
as being characterized by a particular combination of modes, while also adopting explicitly 'empirical' orientations.There are, for example, clear overlaps in questions and, increasingly, in methods from work within human-computer interaction (HCI), multimodal document design, multimodal interaction and extended conversation analytic perspectives.
Among these approaches there is already a long tradition of applying diverse ranges of empirical methods.Such methods include eye-tracking (e.g., Bold & Herranz 1992;Thorisson et al. 1992;Koons et al. 1993), behavioral user studies (e.g., Giard & Peronnet 1999), and later also neurocognitive studies.There has also been work aimed at providing large-scale corpus analyses across genres or from a diachronic perspective that generally adopts a more quantitatively oriented, data-driven perspective.Now this form of empirical work is increasingly overlapping with approaches in which transcription has always been a substantial first step in analysis, but now commonly extended to the description of 'additional' interactional resources, such as gaze (e.g., Goodwin 1980) or gestures (e.g., Streeck 1983).Many of the challenges here are consequently common across both multimodal corpus linguistics and multimodal transcription (Thibault 2000;Norris 2002;Baldry & Thibault 2005, 2006).Many of the investigations in these contexts present the most detailed approaches to the combination of modes to date and so constitute indispensable resources for future empirical research.
30 years after the beginnings of such empirical work, we wanted to probe the question of how empirical multimodality research has in general become.For this, we undertook a systematic overview of empirical work as documented in contributions to several international journals clearly devoted to the study of multimodality.Our goal was to see if there has been any change over the past three decades concerning how articles present themselves along the qualitativequantitative continuum.To obtain a view of a field and to get us closest to a blueprint of the scholarly debate (see Engels et al. 2018: 594-595), international journals offered a suitable object of study, enjoying quality control, a much wider distribution than monographs or handbooks, and easy accessibility.
The data for our survey was consequently sampled from three international journals that we take to be prominent for the broad 'communities' engaging in multimodality research: Social Semiotics (which commenced publishing in 1991), Visual Communication (2002-), andMultimodal Communication (2014-).For purposes of comparison across journals, we limited the articles from Social Semiotics to the time after 2000.Publications were then considered up to and including 2020.All data was gathered from the journals' respective online archives and search engines.Since the total number of contributions did not appear to be directly queryable, we used a search for the term 'communication' as a proxy as we expected this to occur in most contributions.The resulting totals for Social Semiotics, Visual Communication, and Multimodal Communication were 682, 461, and 86 respectively.All references to total numbers of publications in the following draw on these values.
Each of the journals showed a different pattern concerning the number of published articles per year.Social Semiotics showed a slight but steady increase up until 2015 (from around 20 a year in 2000 to around 35 in 2015), while Visual Communication remained approximately constant at 25 over the same period.Then, after 2015, both journals showed a marked increase due to the transition to 'online first' publishing practices and backlogs in the publication queue being processed (with Social Semiotics reaching 60-80 articles in 2019-2020 and Visual Communication over 45).The trend for Multimodal Communication for the time period from its original appearance was quite different, however, with a slight decrease in published articles starting at around 18 per year and ending at around 10; the number is now increasing again.Our specific goal for this chapter was then to explore to what extent articles have been explicitly identifying themselves as more or less empirical in orientation.Subsequently, on the basis of this selection, we investigated the distribution of methods and sizes of data sets among those articles.
The first step in this procedure was to find articles that (even implicitly) 'selfidentified' as being empirical in the loose sense of showing a concern with data.For this, we retrieved all papers that contained occurrences of keywords that we took to be particularly likely to indicate an empirical orientation.The keywords used were 'empirical', 'statistical', and 'calculate', each with morphological variations for tense, adverbial usage, etc. accounted for.Again for current purposes, we restricted hits to those papers where the keywords in any morphological form occurred in the main bodies of the articles, excluding occurrences only in the abstracts or the references.
Each article that was logged matching these search criteria was examined to rule out cases where the keywords had been used but the article did not, in fact, exhibit any empirical orientation.For the articles remaining we recorded the journal in which they appeared, the year of publication, the numbers of authors, and three further categories identifying the kinds of empirical work done.These were: (a) the type of empirical methods employed, either 'qualitative', 'quantitative', or 'mixed' (i.e., triangulating qualitative and quantitative methods), (b) the size of the data sets employed, coded as 'small', 'medium', or 'large', and, if applicable, (c) the types of statistical procedures employed, i.e. simple 'counting', 'descriptive', or 'inferential'.We give further details on the criteria used for these categories in our results below.The data were first collected in an Excel spreadsheet, with rows for each article and columns for each coding category, and then imported into R and R Studio (R Core Team 2016) for actual processing and visualization; graphing and visualization here is done with the R package 'ggplot2' throughout (Wickham 2016).
An initial question was to compare the number of articles retrieved and classified as empirical to the overall number of publications for the journal for each year.The purpose of this was to see whether the proportion of articles self-identifying as empirical had undergone any changes over time.The results are shown in Figure 1, which sets out for each year and for each journal the number of articles judged to be empirical in orientation expressed as a percentage of the total number of articles that appeared in that year.The results for all three journals seem to indicate that it is becoming more common for articles to explicitly frame their work as engaging with data and data analysis.For both Social Semiotics and Visual Communication, however, the earliest values are surprisingly low and so this may indicate that the articles before 2005 should be examined more carefully to see if they are formulating their engagement with data using words other than our adopted keywords.In the case of Multimodal Communication, we see, in sharp contrast, that the number of articles retrieved is a very high proportion of the total journal output, although again showing a marked upward trend over time.
We then turned to the kinds of empirical methods and the scale of the data employed in the articles retrieved and classified as empirical.The results of this part of our study are visualised in Figure 2. In this diagram, the first row sets out the proportions of logged papers that were classified according to the empirical approach adopted: qualitative, quantitative, or mixed.Each bar in the graphs shows how the logged papers divide up over those categories for that year -when the graphs are all one color, then there was only type of approach used in that year; when there are two colors, then the size of the colored regions shows the respective proportions out of the total logged papers for that year; and so on.We can see, therefore, that for the rightmost graph in the top row, for the journal Multimodal Communication, the proportions change quite dramatically over the years sampled, with the papers using mixed methods taking large proportions in two of the years towards the end of the sample.The situation for Visual Communication, shown in the middle of the row, is more evenly distributed with the proportions for mixed methods remaining broadly the same from around 2008 onward.Nevertheless, there is an increase in the proportion of quantitative papers as well.The lower row of the diagram is read in a similar way, but in this case the proportions shown are for the size of the data sets analyzed, divided into small, medium, and large.For current purposes we set the cut-off values for these categories as 'small' being less than 20 analysis items, 'medium' between 21 and 60, and 'large' as more than 60.
Here we can see that for the majority of the time sampled, the vast majority of articles were qualitative and small scale.Particularly for Visual Communication, however, the proportion of larger scale studies has increased to a considerable degree, with the proportion of small scale studies falling below 50% of the empirical articles from 2015 onward.¹A similar pattern can be seen for Multimodal Communication, although with considerable variation.Social Semiotics has stayed predominantly small scale and qualitative throughout -showing if anything a slight increase in small-scale studies over time, although this impression may be artificially induced by the oddly high number of large-and medium-scale articles appearing in 2009-2010.It is interesting to note for all the journals both that the proportion of articles that adopt 'mixed' methods is generally far higher and more frequent than those simply reporting on quantitative results and also that larger scale studies are more prominent than 'medium' scale studies.The reasons for this would need further study, but it may be influenced by researchers, if they are using empirical methods at all, attempting to increase the size of their data sets.This would clearly be in the spirit of the move towards more empirical work that we are promoting here.
Finally, we investigated whether or to what extent there had been a change in the kinds of statistical methods employed.For this we contrasted articles where the quantitative treatment of the data included simple counts of items, where it included basic descriptive statistics such as means and standard deviations, and where it included standard inferential statistics, such as tests of significance of various kinds.We made a general distinction between counting and descriptive statistics, since even the most basic engagements with data may indicate how many cases were being examined without considering further quantitative properties.In addition, we were quite broad in our interpretation of 'inferential', including cases where, for example, corpus annotation accuracy had been verified with inter-coder reliability tests, and so on.The three-way distinction can therefore best be seen as a general indication of the sophistication of the statistical methods employed; finer-grained characterizations could certainly be pursued in the future.The results of our current classification are visualized in Figure 3; this shows the distributions in a slightly different way to that of the preceding graphs.In this figure, the three graphs show the breakdown of logged papers according to the selected categories of types of 'statistical' approaches similarly to before, but now they are not made to sum to 100% because we simply omit the remaining articles where no statistical methods were identified.These articles are not shown in the counts of the graphs in order to leave the pattern clearer among those articles that did use some forms voice approaches, participatory methods, and the focus group and Q-Method" as the remaining 34.85% of empirical methods.10.49% of all articles that were annotated as 'empirical' were seen as quantitative approaches (see Thomson 2021).Focusing on geographical information as well as the visuals used in the papers, the study did not address trends or developments in the use of specific methods over the years.However, it becomes clear that, similar to our own study, the number of more quantitatively oriented approaches, such as experiments and participatory methods, is rather small (approx.10%) in comparison to more qualitatively oriented approaches.Fig. 3: Distribution of statistical methods per year for each journal, expressed as a proportion of the total number of articles logged as broadly empirical per year.The numbers at the bases of each column show the total number of quantitative papers per year.The numbers above each column show the total number of logged papers per year as were visible in Figure 1 above.The grayed out areas are years either where the journal in question did not appear or which lay outside our sample period.(Again all graphs were produced with R 'ggplot2'.) of measurement.The figure then gives a better impression of the extent to which statistical measures of some kind are used with respect to the entire logged output for the journal for each year.
The actual numbers of articles using any of the three statistical methods are also shown in white in the graph and positioned at the bottom of the respective bars for each year.This shows that the absolute numbers we are talking about here are often very small; we thus avoid perhaps artificially inflating their apparent contribution by scaling their internal proportion dimensions to 100%.In addition, since the bars show proportions with respect to the total number of logged articles for each year (which varies), their heights do not correspond directly to absolute counts either.For example, if we examine the two leftmost bars for Visual Communication (middle graph), we see first a bar showing exclusive use of 'counting' and then a bar showing exclusive use of 'descriptive' measures.These are the same height, indicating that they constitute the same proportions of the total logged output of that journal for those respective years (around 26%), but they correspond to different absolute numbers (2 and 1 respectively as shown in the graph) -from this we can see that the total number of logged articles for that journal for the first year was half of the total for the next year; this can also be read directly from the row of counts across the top of the graphs (6 and 3 respectively) as well as from the middle graph of Figure 1 above.
Taking all of the results together we can see, perhaps as would be expected, that the earlier articles all tended to offer either no numeric information or basic counts concerning the data.As time goes on there has been an overall increase in the use of inferential statistics among the papers identifying as empirical, particularly for Visual Communication and Multimodal Communication.Social Semiotics remains the journal where the least use is made of any statistical reporting beyond counts.As the figures in the graphs show, we are dealing with rather small numbers of absolute cases throughout and so any conclusions must be treated with caution.Nevertheless, we do seem to see a general, slow increase across the past two decades in the kinds and scales of empirical work being reported on in these journals.We take this as moderate support for our initial contention that the field of multimodality is, indeed, becoming more open towards empirically-relevant work and so it is, as a consequence, certainly worthwhile now considering in more detail just how that move can be best supported without losing contact with work that is not so inclined.

Promoting an Empirical Turn in Multimodality
As we have now shown, multimodality research, as represented in key journals of the field, has generally seen a steady increase in empirical work, including a more recent strengthening of the larger-scale quantitative line of research.Moreover, as we noted above, areas such as multimodal conversation analysis (Deppermann 2013), interaction studies (Mondada 2007(Mondada , 2016)), and others (see Section 3) have in any case pursued more quantitative approaches from early on.Nevertheless, as our study suggests, the field still shows a preference for qualitative, that is explorative, descriptive and interpretative, work on comparably small data sets.
As argued above (see also Bateman 2016: 37), such work is not, of course, any less revealing per se because of its qualitative nature.However, if left unaddressed, conceptual vagueness that may result from integrating frameworks from the various 'corners' of this highly interdisciplinary field may certainly restrict the explanatory 'reach' of such contributions.Indeed, there seems to be a general "lack of appropriate methodological guidance" in the field (Bateman 2016: 37, also in reference to Halliday 1994 andForceville 2007); explicit discussions of how precisely to move 'from theory to data' are rare (Bateman 2016: 37).Referring to cultural studies, Piper (2016: 6) describes this situation as "The Theory Gap".Fur-thermore, although 'small n' qualitative research is not a weakness of a disciplines' empirical dimension in itself, it can become problematic if judgement continues to be led by untested intuition alone, particularly if we give in to the "temptation to generalize, to scale-up the nature of one's argument" (Piper 2016: 4) even when those generalizations are not founded on a sufficient number of data points; Piper (2016: 4) calls this "The Evidence Gap".Finally, there seems to be only a weak tradition of "documenting and theorizing our practices more extensively" (Piper 2016: 8, glossed as 'The Self-Reflexive Gap'), especially when it comes to disclosing the details of methodological procedures of data collection and analysis.
It has to be emphasized, of course,that pursuing empirical approaches is "not a simple recipe for an unrealistically 'clean' structure of knowledge" (Sampson 2002: 5).In fact, empirical research usually leads to "a more puzzling picture" (van Peer et al. 2007: 21) since it makes us aware of the complexities of real-world data, or "[s]ometimes, nothing happens" (Piper 2016: 6), a negative result that may cause frustration.Both of these phenomena must be considered as gains rather than losses, because 'a more puzzling picture' may be precisely what we need to get us closer to the communicative reality we seek to understand; even 'negative results' are revealing and should be considered "as important as the novel insight of something previously unseen" (Piper 2016: 6).If we begin to close the Theory Gap, we are likely to produce more objective and reproducible descriptions of individual materials; if we start to close the Evidence Gap, we will be able to generalize our descriptions more reliably; and if we can close the Self-Reflexive Gap, we begin to mark out more clearly the "terrain of what one knows [. . .]" (Piper 2016: 8).

Quality Criteria in Other Empirical Fields and Their Transferability
Research communities with a more pronounced emphasis on empirical work have naturally engaged in discussions of how the quality of their research can be critically assessed.In psychology, for instance, it is common practice to evaluate the quality of empirical, often experimental, research designs on the basis of several criteria.Three primary criteria are: objectivity, reliability, and validity (see, e.g., Rost 2004;Himme 2009;Moosbrugger & Kelava 2014).There are also several secondary criteria: for instance, scalability, test economy, practicability, and fairness (see Himme 2009;Moosbrugger & Kelava 2014).The primary criteria, in particular, have important points of reference for other research fields with an interest in human behaviour, e.g., empirical social science research (see Kromrey 2002: 390-392), and so will also become relevant for multimodality research.
Objectivity is achieved if the test procedure (including the materials, the actual testing, and the generation and interpretation of results) is independent of any influences other than participant-specific factors, that is, independent of the researcher conducting the test, and the place and time at which the test is carried out (Rost 2004;Himme 2009;Moosbrugger & Kelava 2014;see also Krippendorff 2004).Objectivity can generally be imposed through standardization, e.g., by using a test manual with detailed instructions (Moosbrugger & Kelava 2014: 10).Reliability is smaller in scope, and zooms in on a test's measuring instruments and their capacity to produce the same results again and again upon repeating the test -independently of what the test is supposed to measure.Reliability is achieved when a test procedure produces highly correlating values across those variables that are assumed not to influence the test results and is typically assessed through retesting or parallel tests (Rost 2004;Himme 2009;Moosbrugger & Kelava 2014; see also Krippendorff 2004).Finally, validity concerns the content-related fit between what a test measures and what it is supposed to measure in light of the research questions.In that sense, it allows for estimating the meaningfulness of test results (Rost 2004;Himme 2009;Moosbrugger & Kelava 2014;see also Krippendorff 2004).
These three criteria are logically related to one another (Rost 2004: 33).Objectivity is a prerequisite for reliability because an accurate measuring instrument is useless if the reliable results it produced are not evaluated objectively (Rost 2004: 39).Also, objectivity and reliability may generally allow for a high accuracy of the measurements but the results are meaningless if they actually do not 'respond' to the research questions posed (Moosbrugger & Kelava 2014: 13).
Qualitatively-oriented research fields, to which many corners of current multimodality research are arguably closer, have engaged in extensive discussions of the transferability of these criteria for their own concerns.As a result, scholars have moved in different directions: those pursuing an extrinsic approach support importing criteria from quantitative research paradigms, and those pursuing an intrinsic approach suggest designing criteria exclusively for the qualitative context in which they are put to use (Sousa 2014: 213).It is important to note here that no approach is per se better than the other.After all, the quality of research needs to be assessed in relation to the research field framing it, its core interests, paradigms, and epistemology (cf.Sousa 2014: 212).Such a view grants qualitative fields that value an "on-site flexibility and less stepwise research design" (Yanow & Schwartz-Shea 2015: xix) the freedom to conduct case studies and practice "contexualized ('thick') description" (Bhattacharya 2008).
Multimodality research, in contrast, has always shown a strong interest in finding regularities and patterns in communicative behavior, not least due to its strong ties to linguistics, semiotics, and communication studies.An urge to theorize, to construct typologies and taxonomies, often leads multimodalists to abstract away from the singular and particular -and thus requires a corresponding approach for undertaking empirical research to support this.Such an approach would, ideally, on the one hand, aim to integrate qualitative perspectives productively.Such perspectives are still particularly relevant to empirical multimodality research at its current stage of development.They are "invaluable tools for thinking semiotically and can support useful conjectures, new conceptual arrangements, and are always ready to address new phenomena" (Bateman 2019: 315).On the other hand, due to our clear interest in finding productive generalizations, doing empirical multimodality research needs also to rely on quantitative methods to a greater extent, while paying more attention to making the move from theory to data (and back) explicit.

Five Criteria for Good Empirical Multimodality Research
The following five criteria are intended to describe quite explicitly what exactly makes certain research empirical in our understanding.At the same time, we hope thereby to provide some guidance as to what to consider when designing and conducting empirical research in multimodality.Similar to the quality criteria established in other fields, the five main criteria we foreground are: 1.The Feedback Loop: From Theory to Data and Back 2. Objectivity 3. Reliability 4. Validity

Tentativeness of Results
In addition, we also extend this list with several indicators for good empirical practice: explicitness, transparency (see, e.g., the open science movement), replicability & replication, generalizability, and triangulation.As can be seen below, we have positioned these criteria in relation to the main ones.For reasons of space, further criteria, such as fairness, practicability, or sustainability, are not addressed here, but are equally important and demand consideration whenever empirical research projects are being designed.At the end of this section, we then also give a short on note on the data needed for empirical analysis.

The Feedback-Loop: From Theory to Data and Back
From Theory to Data.The first half of the loop requires a detailed operationalization of broader theoretical constructs in order to make them "reliably recoverable" (Bateman 2019: 303) and thereby to ensure descriptions of those phenomena they are supposed to describe.Within Legitimation Code Theory (Maton & Chen 2016), several useful mechanisms have been described for these purposes: Data instruments provide methodological recommendations as to how abstract theoretical concepts suggest "foci for data collection and questions for analysis" (Maton & Chen 2016: 30).Mediating languages constitute typologies of categories that serve to make theoretical concepts more sensitive to the particularities of actual data.And translation devices operate at a low level of abstraction and are sensitive to the context of a particular study (Maton & Chen 2016: 31).Employing these mechanisms, and being explicit about how they were employed, ensures that the connection between theory and data can be made in a reliable fashion, and their data-sensitiveness helps enforcing preconceived theoretical concepts (see Bateman 2019: 301).
From Data to Theory.The second half of the loop is concerned with processing the annotated data and relating the results back to theory.This can be done efficiently by a search for patterns and regularities, typically accomplished in a top-down (theory-driven) or bottom-up (data-driven) fashion (see Bateman & Hiippala, this volume).Such methods involve quantification and an exploration of correlations across various kinds of descriptions (Bateman 2014: 252).Statistical processing is not at all limited to numerical descriptions; all it takes to involve category-based annotations is a statistical model fit to process them (see also Bateman & Hiippala, this volume).
Statistical approaches typically require larger quantities of data to produce meaningful results.Thus, statements about the general validity of smaller-scale studies need to be made with appropriate caution.Even if annotations produced on the basis of a single text suggest a mismatch between previous theory and features of actual data, this indication remains weak until backed up with further results.Thus, corpus work needs to become larger in scale, which, in turn, means relying more than before on (semi-)automated analyses and visualization methods (cf.O'Halloran et al. 2011;Bateman 2014: 252;Kohrs 2018).Our community needs to continue working towards making such methods more accessible to fellow scholars, while promoting the pooling of our various skill sets in broader research teams.

Objectivity
Our investigations of multimodal artifacts or performances may be classified as objective to the extent that our frameworks can be applied without any more particular knowledge beyond what is specified in a test manual.Ideally, then, any analyst can apply the framework, at any place, or time with similar outcomes.Objectivity can thus be achieved if we are sufficiently explicit about previous assumptions or knowledge necessary for data collection and analysis (explicitness).One technique commonly used for this, particularly in content analysis (cf.Schreier 2012), is to produce a so-called code book where one can document previous assumptions, methodological recommendations (data instruments), and typologies of analytical categories (mediating languages and translation devices).
It is increasingly seen as good practice in many fields to be transparent by making such code books publicly available, together with the actual data, all documentation regarding the research questions/hypotheses, the choice of methods, the annotation process, and even further processing steps (including code for statistics software such as R).All such considerations become ever more important when engaging with interdisciplinary work (see Yanow & Schwartz-Shea 2015: xv; see also Sousa 2014: 216).
If justice is done to the quality criterion of objectivity, a multimodal study not only becomes repeatable by other researchers (replicability), it also allows them to challenge previous research designs, to correct them, or to build on them (see Piper 2016: 7 on "The Self-Reflexivity Gap").

Reliability
Much of contemporary non-experimental multimodality research accomplishes the description of data through applying conceptual categories.In this context, reliability refers to whether a concept and its associated definition is capable of producing the same categorizations repeatedly when used to describe the same phenomena in similar data sets.²A high degree of reliability can be achieved by working out in detail how theoretical concepts are to be operationalized in their application to data, for example in the form of a code book as mentioned above.Training sessions in which coders annotate smaller data sets can be used to assess the reliability of an annotation scheme through inter-coder reliability checks (Krippendorff 2004: 215).Also, researchers can test if the same results are generated upon repeating a study (testing for intertemporal stability).This also provides an opportunity to revisit decisions made in designing and conducting a study.Benoit & Holbert (2008: 615-616) argue that, while repeating a study is common practice in the natural or empirical social sciences, replication still has to gain traction in communication studies and other humanities(-related) research fields.Making research in multimodality more reliable requires explicit documentation (objectivity), even if this entails taking up 'valuable journal space' (Benoit & Holbert 2008: 616).And even prior to publishing a study's results, this can require considerable resources to conduct tests of the intersubjective stability of any coding used.
This may well be beyond the scope of individual research projects, particularly at graduate level, and so our recommendations here are straightforward.Even if the limited size of a research project makes it difficult to pursue 'double-coding', this goal should nevertheless be borne in mind as an 'ideal' that one is, for perfectly justifiable reasons, perhaps not achieving in some particular case.If one designs a project as far as possible so that double-coding could have been done, then the resulting design will be more likely to satisfy the other criteria more fully as well.We should generally re-think multimodality research as a 'team effort' even when the team remains unrealized.In any case, one should report whether reliability tests were conducted and, if not, indicate what made that difficult or impossible; sometimes research might simply be too exploratory to warrant reliability checking.Explicitness concerning this point is always preferable.

Validity
As in empirical testing, achieving validity in an empirical multimodal study hinges on how well our research project scores with respect to the criteria of objectivity and reliability.Even if the descriptive categories correspond to the research question, the findings are unusable if they turn out to be too vague to produce reliable categorizations.Likewise, if a coder lacks a solid grasp of the coding scheme (even if it would have afforded reliability), and their annotations are thus skewed by a subjective 'interpretation' of the concepts, the study will not afford valid results.Paying close attention to carrying out a research project objectively and using reliable tools therefore forms a solid basis for achieving validity.
Validity can be ensured if researchers are explicit (explicitness, objectivity) about their research questions and hypotheses, while making ample reference to existing knowledge in whatever area of investigation is at issue (construct validity, Moosbrugger & Kelava 2014: 16, see also Krippendorff 2004: 315).On this basis, researchers should then seek to compile annotation schemes in such a way that the phenomena under investigation are adequately represented.An estimation of adequacy, in this context, is difficult to measure and thus derived from an 'informed judgement' based on the knowledge shared within a research community (content validity, Moosbrugger & Kelava 2014: 15, see also Krippendorff 2004: 315).

Tentativeness of Results
The quality criterion seems simple since it is a presupposition of the idea of working scientifically.Upon closer scrutiny, however, both the idea and its consequences are far from trivial: the results we generate through empirical (multimodality) research are always tentative in nature, and may need to be replaced at some point (see Sampson 2005: 4;Sousa 2014: 217).This ties in with Peirce's notion of pragmatism and is also a necessary consequence of his concept of abduction, the process of forming explanatory hypotheses (Peirce 1931(Peirce -1958: 5.172: 5.172).The tentativeness of results is at the very heart of empirical research -a constant invitation to travel along the path of the feedback loop, and be more 'knowledgeable' every time one comes round full circle.

A Final Note on Data
In addition to being intersubjectively accessible, the data for multimodal studies should be selected in view of the research question or hypotheses so that it affords providing answers to the questions posed (see Bateman & Hiippala, this volume).Once the type of data has been decided on, it is usually collected into a corpus (Bateman 2014: 239)³; many examples are given in the chapters of this book.
When conducting empirical research projects, researchers face the (vexed) question of how much data they will need in order to make any valid claims.The amount of data that one needs to find an effect depends on how frequently the effect occurs: if it occurs very often, then obviously it is more likely that some collection of data will include sufficient examples to draw conclusions.The amount of data required also depends on just how 'strong' an effect is -if it is a strong effect, then, again, less data will be necessary to show it at work.In any case, the data gathered needs to include at least data exhibiting the range of variations and phenomena that are the target of the research questions.For example, if one is probing the different use of multimodal resources made by contrasting groups of sign-users, then the data needs to include sufficient examples from those groups, and so on.There is no point probing data for variation that the data does not include.

Overview of Contributions
The particular view and methodological requirements of empirical multimodality research that we have laid out in this introduction are meant to provide a more general thematic frame in which to place the typically much more specific discussions and case studies presented in the remainder of this book.In his chapter entitled Dimensions of Materiality, John A. Bateman's starting point is the long-standing tradition of attributing materiality a central role in multimodality research.The development of an empirically robust account of materiality is then central.To this end, he argues that 'external languages of description' (cf.Maton & Chen 2016) are needed for securing and organizing proper analytical 'access' to data.On the basis of previous work in Bateman et al. (2017), the chapter construes materiality as such as an external language of description, and intro-duces temporality, space, role, and transcience as its central characteristics.This extended view of materiality is finally related to the semiotic purposes of communication in the broad framework of multimodality adopted and illustrated by the example of three different communicative situations and their comparison with regard to their material canvases.Thus, the chapter provides a systematic approach to materiality as a "reliably recoverable" (Bateman 2019: 303; see above) theoretical construct.The author's contribution is thus an important step toward robust empirical methodologies and to achieving a close connection between theory and data.
Pursuing similar aims, John A. Bateman and Tuomo Hiippala's chapter, From Data to Patterns, sheds light on the concept and practice of modeling in empirical research.With the aim of crossing the disciplinary boundaries in multimodality research, the authors suggest an understanding of models as specifically structured descriptions of patterns and regularities from a semiotically oriented perspective on iconicity, following Peirce.On this theoretical basis, the empirical procedure of moving from theory to models to data and back is further discussed and exemplified by a critical evaluation of certain types of modeling procedures and techniques for formulating and evaluating models.
The following two chapters then combine theoretical discussions with more extensive practical work.Barbara Tversky and Angela Kessell's chapter, entitled Thinking in Action, focuses on the mapping of thoughts to non-verbal and verbal expressive resources.Based on their empirical work, the authors demonstrate that gestures and marks on a sheet of paper have many important properties in common.They furthermore bring out that gesture supports direct (iconic) expression of actions.This engagement of the embodied perception of action as well as of the visual offers considerable benefits in relation to language alone.The authors consequently argue for a combined network of gesture, action, the designed world, and abstraction, which they call 'spraction'.Despite being a reprint of a previously published journal article (2014), the contribution offers invaluable insights which are as topical today as then, particularly in the context of current multimodality discussions.
Finally in Part II, Ralph Ewerth, Christian Otto, and Eric Müller-Budack's chapter, Computational Approaches for the Interpretation of Image-Text Relations, discusses and demonstrates computational approaches to the processing and interpretation of text-image relations from a computer science perspective.Based on previous work on the classification of text-image relations, and taking into account approaches from linguistics and communication studies as well, the authors define computable dimensions for different types of relations, namely cross-modal mutual information, semantic correlation, and status relation, and use these to draw out eight image-text classes and their relation to existing tax-onomies.The chapter furthermore reports on experimental results generated through an automatic classification of these text-image relations applying deep learning approaches and presents a more differentiated model for the dimension of cross-modal information for image-text pairs in news.The approaches developed contribute to the expansion of tried-and-tested empirical methodologies for multimodality research that utilize automated computer-based processing to bridge the 'semantic gap' between text and image.

Part III -Empirical Inroads: Case Studies and Results.
In his chapter "I can't see why you're laughing": Multimodal Analysis of Emotionalized Political Debate, Andreas Rothenhöfer places an extract from a UK TV news programme showing a short interaction between a politician and a news anchor under the analytic microscope.This extract received considerable public attention due to the politician's supposedly controversial smirk-like reaction.To understand the interaction and its take-up in more detail, Rothenhöfer pursues a mixed-methods approach to facial expression analysis in combination with a qualitative pragmatic perspective with the aim of analysing the reception, co-construction, and reframing of the short interview sequence as presented on Twitter.By using the computational platform iMOTIONS and the analytical software Affectiva, the author demonstrates the usefulness and applicability of biometric analysis to reconstruct and distinguish behavioral interaction chains from more general mood or attitude aspects, and to support or contradict the perception of such interactions in public discourse.The study thus also contributes to a further exploration of software-based tools and their productive complementation with qualitative approaches.
In their chapter entitled A Corpus-based Approach to Color, Shape, and Typography in Logos, Christian Mosbaek Johannessen, Mads Lomholt Tvede, Kristoffer Claussen Boesen, and Tuomo Hiippala then present a data-driven corpus study of color, shape, and typography in corporate logos.With the aim of addressing the 'social style' of logos as representations of brands and branding, the authors operationalize their analysis of the graphic canvases of 50 logos from the oil industry and non-governmental environmental organizations by analyzing the dimensions of shape, color, and typography.The analytical framework reflects 14 types of material properties, and positions them as variables whose values represent stylistic choices in logo design.For an evaluation of the interaction of these variables, the authors employ Principal Component Analysis (PCA) as an inferential statistical method to exhibit patterns of variation in the corpus.Re-sults show that certain groups of logos show statistically significant differences in their specific uses of shapes, color, for example, and suggest that that variation is dependent on the organization/industrial sector they are used to represent.
In the following contribution, entitled Pixel Surgery and the Doctored Image, Hartmut Stöckl draws on a corpus of 232 print advertisements from Lürzer's Archive in order to investigate the function of computer-generated images when used to construct multimodal arguments.Combining scholarship in pictorial theory, visual rhetoric, and multimodal argumentation, the author develops an extensive typology of manipulations of visual structure ('design operations'), investigates the rhetorical potential they bear, their relational propositions and, ultimately, the argument types they support.Stöckl's contribution features a detailed code book which noticeably increases the study's degree of objectivity.The relative frequencies of occurrences of the annotated categories are then interpreted as prototypical and functionally effective image design and multimodal argumentation strategies pursued in advertising.
Next, Jiaping Kang and Zhanhao Jiang's chapter, entitled Multimodal Discourse Analysis Based on the GeM Model, presents a thorough application of the Genre and Multimodality-framework (originally designed for analyzing page-based documents: cf.Bateman 2008) to a small corpus of 10 U.S.-American and Chinese environmental protection posters.With the help of XML-coding to annotate the posters' basic compositional unity, their layout and rhetorical structure, and utilizing the GeM-Tools developed for further processing by Hiippala (2015), the authors provide a contrastive analysis of the semiotic resources used in both sets of posters.The results show, for example, that the distribution of verbal and visual units in the two cultures is very similar, but that there is variation in the use of language and typography.Despite its comparably small sample size and leaning towards qualitative empirical research, the study clearly illustrates the gains of a detailed multi-layer analysis that seeks to tie lower-level data-sensitive annotations to higher-level analytical concepts, such as rhetorical relations, and so stands well as a motivation for potentially larger-scale studies.
Adopting a quite different theoretical approach, Loli Kim and Jieun Kiaer draw in their chapter, Conventions in How Korean Films Mean, on the framework of Segmented Film Discourse Representation Structures (Wildfeuer 2014) to conduct a pilot study of the nature and content of 'final goodbye'-events in the contemporary South Korean films Old Boy (2003), Sympathy for Lady Vengeance (2005), and The Man from Nowhere (2010).By formally specifying discourse segments and discourse relations in several relevant film scenes, the authors identify reoccurring patterns of filmic configurations that can be assumed to function as conventions among the three films analyzed.The results show that such empirically-supported testing of existing methodological frameworks adds considerable detail and precision to the understanding of how meaning in film is constructed.
Finally, Dušan Stamenković and Janina Wildfeuer's contribution, entitled An Empirical Multimodal Approach to Open-World Video Games, reports on a case study analysis of the video game Grand Theft Auto V (Rockstar North 2013).The authors present their extensive annotation work with regard to all 80 main missions of the game, and draw out a comprehensive semiotic inventory of the game's basic semiotic elements used in these missions.These elements are further analysed with regard to their frequency of occurrence in the game to show statistically attestable associations between variables (correlations).On this basis, Stamenković and Wildfeuer demonstrate how a diversity of combined features structure the experience of playing the game, and guide and instruct players within an essentially open game world.They also show how sufficient empirical evidence for specific semiotic elements in complex multimodal artefacts establishes a more stable ground for investigating hypotheses of meaning-making in these artefacts.

Framing Conclusions
Despite the natural interest of empirical multimodality research in conducting databased research, the approach is certainly still far from realizing its full potential.In this introduction, we have discussed the gradual implementation of an increasing range of quantitative work as well as larger-scale studies and have argued that this now constitutes an important avenue to pursue for the advancement of the field.At the same time, however, this must be done while still granting more exploratory work a permanent and prominent position in the overarching research agenda.Given the breadth of multimodality concerns, there will always be a need for exploration: what we suggest, however, is that even exploration can be undertaken with an eye to subsequent, less exploratory investigations in depth.
Consequently, we also argued further that, in order to ultimately achieve a more robust empirical grounding for multimodality research across the board, both qualitatively-and quantitatively-oriented studies would benefit from allowing themselves to be guided by five core quality criteria for good empirical practice -namely completing the feedback-loop (from theory to data and back), implementing the principles of objectivity, reliability, and validity, and acknowledging that the results generated will necessarily remain tentative in nature.We believe that following these principles in future research is essential if we are to continue our productive investigations of increasingly complex artefacts and performances, to further strengthen our theoretical and methodological frameworks, and to ulti-mately encourage an even more intense exchange among the diverse communities within our emerging discipline, and beyond.Particular examples and directions for these developments are evident in all of the individual contributions to the volume.

Fig. 1 :
Fig.1: Proportion of articles logged as broadly 'empirical' in the sense of being concerned with data.Each graph shows the number of articles logged per year for each journal and a fitted linear trendline with standard error indicated by shading (graphed with R 'ggplot2').The trendline simply places a straight line approximating the data (seeBateman & Hiippala, this  volume)  to show broad relationships between proportions of empirical papers and years.

Fig. 2 :
Fig. 2: Distribution of empirical methods and scale of data sets per year for each journal expressed as a proportion of the total number of articles tagged as broadly empirical each year.The grayed out areas are years where the journal in question did not appear or which lay outside our dataset.(Graphed with R 'ggplot2'.) Following on the present chapter, which makes up Part I -Introduction, the book continues with two further parts: Part II -Charting Paths for Empirical Research: Theoretical and Methodological Reflections (Chapters 1-4) and Part III -Empirical Inroads: Case Studies and Results (Chapters 5-10).Part II provides insights in theoretical thoughts and methodological discussions of recent contributions to the field of empirical multimodality research; Part III provides some rich empirical case studies of multimodal artefacts and performances to illustrate state-of-the-art empirical work in the field and to identify persisting research gaps and suggest future avenues for research.In this section, we survey the contributions briefly and position them in relation to the argumentation presented above.Part II -Charting Paths for Empirical Research: Theoretical and Methodological Reflections.