An assessment of email and spontaneous dialog visualizations

https://doi.org/10.1016/j.ijhcs.2012.02.002Get rights and content

Abstract

Two experiments were conducted examining the effectiveness of visualizations of unstructured texts. The first experiment presented transcriptions of unrehearsed dialog and the second used emails. Both experiments showed an advantage in overall performance for semantically structured two-dimensional (2D) spatialized layouts, such as multidimensional scaling (MDS), over structured and non-structured list displays. The second experiment also demonstrated that this advantage is not simply due to the 2D nature of the display, but the combination of 2D display and the semantic structure underpinning it. Without this structure, performance fell to that of a Random List of documents. The effect of document type in this study and in Butavicius and Lee's (2007) study on visualizations of news articles may be partly described by a change in bias on a speed-accuracy trade-off. At one extreme, users were accurate but slow in answering questions based on the dialog texts while, at the other extreme, users were fast but relatively inaccurate when responding to queries about emails. Similarly, users could respond accurately using the non-structured list interface; however, this was at the cost of very long response times and was associated with a technique whereby participants navigated by clicking on neighboring document representations. Implications of these findings for real-world applications are discussed.

Highlights

► Two user studies assessed list and spatialized displays (structured and unstructured). ► The first study used emails and the second used transcriptions of spontaneous speech. ► Overall, spatialized displays outperformed list-based displays. ► Without content-based structuring, the spatialized display had no advantage over a list. ► Users could be very accurate with an unstructured list but only at the cost of speed.

Introduction

Document visualizations are graphical representations of a set of text documents. The aim of these visualizations is to convey trends and patterns that would be impossible, or very time consuming, to ascertain based on an examination of the individual documents alone. Visualization tools are particularly beneficial when the number of documents in the set to be analyzed is very large. As White, Muresan and Marchionini (2006) have pointed out, document visualization may be of benefit for exploratory data analysis when (1) the search problem is not well defined, (2) the user is not familiar with the problem domain and (3) when multiple points of view need to be considered in investigating the documents. For these reasons, document visualization tools have gained increasing popularity in not only intelligence gathering for security, defense and law enforcement (e.g., Stasko et al., 2008) but also for detecting trends in the domains of science, politics and public opinion (e.g., Clavier and El Ghaoui, 2008, Mothe et al., 2006, Powell, 2004).

A common approach to document visualization involves proximity-based techniques. A specific example of such an approach is a point-based spatialized display (also known as a spatialization) whereby each document is represented by an icon and the distance between the icons represents the similarity between the documents (e.g., ACQUAINTANCE and PARENTAGE: Liu et al., 2000; LEXIMANCER: Smith, 2000; TEXT GARDEN DOCUMENT ATLAS: Fortuna et al., 2006). That is, the icons for documents that are determined to be similar are positioned closer to each other in the display. This type of layout is consistent with Montello et al.'s (2003) ‘First Law of Cognitive Geography’ which states that “people believe closer things to be more similar than distant things” (p. 317). In two relatively large scale (N=45 and 48) user studies these authors demonstrated empirical evidence for this principle. The success of these displays possibly lies in the ability of the human visual system to detect easily patterns in the display such as clusters and outliers. As Brusco (2007) has pointed out, the ability to partition such point arrays into clusters is one of many visual combinatorial optimization problems for which the human visual system appears to be very well adapted (see also Vickers et al., 2001).

The reliance of spatialized displays on the capabilities and limitations of the human visual processing system, and the need to provide empirical evidence as to their real-world effectiveness, suggests that it is important to study user behavior. In particular, it seems important to study whether or how visualizations facilitate performance on the information-handling tasks they are designed to support. As a result there is a growing body of studies that have conducted empirical research into the effectiveness of visualizations (e.g., Butavicius and Lee, 2007, Cribbin and Chen, 2001, Don et al., 2007, Fabrikant et al., 2004, Fabrikant et al., 2006, Tory and Möller, 2004, Tory et al., 2007, Lee et al., 2003, Sanyal et al., 2009, Ware, 2000, Westerman et al., 2005, Westerman and Cribbin, 2000).

While the notion of visualizations such as spatialized displays may have strong intuitive appeal, the empirical support for a performance advantage over traditional list type displays is not unequivocal. Certainly, there is strong evidence that the performance associated with 2D visualizations is better (or at least no worse) than 3D visualizations (Fabrikant, 2002, Newby, 2002, Sebrechts et al., 1999, Westerman and Cribbin, 2000). In addition, Butavicius and Lee (2007) and Tory et al. (2007) have found better performance with spatialized displays when compared to list and landscape displays. However, Cribben and Chen's (2001) user study found better performance, at least for some tasks, for a list-based display when compared to a spatialized display and two network displays that also contained links between related documents. Similarly, Hornbæk and Frøkjær (1999) found that there was no difference in the number of documents retrieved or marked as relevant between text-only search and a visualization display known as a thematic map (essentially a spatialized display with the addition of “theme” words on the display) but that participants took longer when using the thematic map. Qualitative analysis of participants' verbal descriptions of their thoughts and actions in the study suggested that there was a tendency for the users to get ‘lost’ in the thematic map displays. Finally, Swan and Allan (1998) found only modest improvements in more sophisticated spatialized displays over text-only interfaces. As suggested by Newby (2002), it is still unclear whether “visual interfaces for IR can be more effective than text-based interfaces” (p. 50).

There could be two reasons for the variation in findings regarding list versus spatialized displays in the literature. Firstly, many of the user studies reported in the literature are based on small sample sizes. For example, Swan and Allan (1998), Cribben and Chen (2001) and Hornbæk and Frøkjær (1999) used studies of size 24, 15 and 6 respectively. While such smaller scale studies can be very informative, their results are harder to generalize and lack statistical power. Secondly, there is a great deal of variation between studies in the tools being tested, with many studies employing more complete visualization tools with different combinations of functionalities (e.g., INFOSKY: Granitzer et al., 2004; YAVI: Newby, 2002). As a result, the potential influence of a range of interface and functionality variables is not well controlled across the different experiments.

In this paper, we seek to address the first issue using comparatively large sample sizes (N=48 and 49) for our two experiments, and also by using a repeated-measures design to increase statistical power. We attempt to address the second issue by testing specific components of a visualization interface as opposed to a complete visualization tool. A final feature of our approach is that we use human document similarity judgments, rather than machine substitutes, in the construction of the displays. This is particularly important given the inconsistencies between human and machine judgments demonstrated in Lee et al. (2005). The use of human similarities allows us to focus on testing the visual component of the visualization in isolation from the quality of the underlying document similarities (for further discussion see Butavicius and Lee, 2007).

Our approach is similar to ‘de-featured’ systems (Morse and Lewis, 1997) and BASSTEP methodologies (Morse et al., 2002), in which only basic features are tested or introduced at each stage of the development process. Our approach differs from these methodologies in that we are using a controlled experimental framework for our testing. As Walenstein (2002) has pointed out, testing of the isolated components of software is necessary to comprehend “the abstract principles of the support provided by the tools rather than the interfering details of the particular prototype” (p. 39). This means the findings of this paper may inform the design of a wide range of visualization and information retrieval tools that employ spatialized and list-based displays.

Our study builds on one reported by Butavicius and Lee (2007), who evaluated the performance of 80 participants in an experiment using four different visualization techniques applied to news articles. The displays were a Random List, an Ordered List and two two-dimensional visualizations using the multidimensional scaling (MDS: Shepard, 1980) and Isomap (Tenenbaum et al., 2000) layout algorithms. All but the Random List display were constructed using human judgments of document similarity from Lee et al. (2005) to ensure that they were structured using a cognitive model of the document space. In the Butavicius and Lee (2007) study, participants performed best – in the sense that they were faster and accessed fewer documents – when using the structured displays and the two-dimensional (2D) spatialized displays outperformed the one-dimensional (1D) lists.

Our study extends Butavicius and Lee's (2007) paper in two main ways. Firstly, in the previous experiment, all the experimental conditions with a 2D layout were structured using algorithms operating on human judgements of document similarity. Therefore, it is not possible to rule out the hypothesis that the performance achieved in these conditions was simply due to the fact that the documents were laid out in the 2D plane. Westerman and Cribbin (2000) showed that increasing the semantic variance accounted for by 2D solutions in spatialized displays (to the order of 50%, 70% and 90%) was found to improve performance on a search task. However, it is not possible to determine from either Butavicius and Lee (2007) or Westerman and Cribbin's (2000) studies whether simply laying document representations out randomly in a 2D plane, without any structuring according to semantic information, may still be advantageous to the user (or conversely, whether such visualizations are indeed worse than unstructured list-based displays). For example, a random 2D spatialized display may allow a user to remember where documents are better than an unordered list of documents. To address this issue, we include a random 2D spatialized display condition in the second experiment in this paper. In so doing, we also address another issue in real world applications of visualization tools, concerned with how a visualization of document space will perform in cases where there is little semantic structure to be found (i.e., the documents are all from disparate topics). Many intelligence and exploratory applications of visualization tools, where the corpora of documents changes frequently, result in the semantic structuring of the space changing rapidly and unpredictably. In addition, distinct semantic structure may be less apparent in visualizations of email and spontaneous speech because, as we discuss shortly, the topicality in such texts can be varied both between and within the documents. It is therefore useful to determine whether visualization tools provide any advantage or disadvantage over conventional list-based displays in these ‘worst-case’ scenarios.

The second way in which the current paper builds on Butavicius and Lee (2007) is by examining email and transcriptions of telephone conversations. As with most user studies in visualizations, Butavicius and Lee (2007) used well-edited documents. Many previous assessments of visualizations have used similar documents in the form of news articles (e.g., Cribbin and Chen, 2001, Granitzer et al., 2004; Experiment II: Newby, 2002) and journal articles (e.g., Hornbæk and Frøkjær, 2003). These sorts of articles are also used extensively to test information retrieval tools in benchmark tests and competitions (Voorhees and Harman, 2005). However, it remains to be seen how well visualization techniques perform when faced with more spontaneous, less polished texts such as unrehearsed conversations and emails.

Spontaneous speech and email are similar to a range of newer communication media involving computer-to-computer interactions including web logs (colloquially known as “blogs”), Internet forums and instant messaging. These fora are increasing in popularity and represent a wealth of information that lends itself to exploration using visualization tools. All of these differ from professionally edited news articles in a number of ways including:

  • a.

    Linguistic features: particularly in spoken dialog, the presence of features such as speech repairs (Levelt, 1983) and discourse markers (Shiffrin, 1987) can make interpretation of such language difficult (Heeman and Allen, 1997).

  • b.

    Vocabulary: more conversational or informal communications often feature the use of slang and more fluid language use including specialized vocabulary, emoticons, acronyms and abbreviations.

  • c.

    Information density: these documents are often characterized by their “loose, unstructured, garrulous or unedited quality” and may be considered to be ‘information poor’ in comparison to documents that are engineered by communication experts (Toffler, 1970, p. 155). In contrast, engineered documents such as articles, scripts, formal speeches are “highly purposive … [and] pre-processed to eliminate unnecessary repetition” (Toffler, 1970, p. 155).

  • d.

    Breadth of topicality: rather than being focused on a particular topic, these less-structured documents can cover a range of different topics.

These characteristics can make such communications difficult to analyze for both humans (Hornbæk and Frøkjær, 2003, Ratté et al., 2007) and computers.

Given the dialogic character of these media, many tools for visualizing such archives have centered around presenting and analyzing patterns in the metadata, e.g., the sender, recipient and time/date stamp information associated with an email (e.g., MAILVIEW: Frau et al., 2005) or the author information and thread in newsgroups and web forums (e.g., CONVERSATIONAL LANDSCAPE and LOOM: Donath et al., 1999). Subjective assessments of such visualizations, when used to display hierarchical, correlational and temporal patterns in email archives, have been favorable (Perer et al., 2006, Perer and Smith, 2006). There have also been efforts to represent the content of such communications as well. There are tools documented in the literature containing spatialized displays for visualizing the entities (i.e., people, places dates and organizations) within an email corpus (JIGSAW: Görg and Stasko, 2008), author's mood (e.g., CONVERSATIONAL LANDSCAPE and LOOM: Donath et al., 1999) as well as content similarities between individual messages via spatializations for blogs (INSPIRE: Gregory et al., 2007; VIZBLOG: Pérez-Quiñones et al., 2007) and topicality clustering of emails (BUZZTRACK: Cselle et al., 2007). However, despite this increased interest to date we are not aware of any empirical study that has tested the performance of visualizations of such communications.

In this paper, we present two experiments that examine how well several proximity-based visualization techniques assist a user in the analysis of spontaneous speech transcripts and email texts. In these displays, we are interested in representing the content of a collection of texts across a number of individuals. This type of display could be of use in a task of an analyst who, for the purposes of business, political or security intelligence gathering, is exploring a corpus of unstructured, spontaneous texts from multiple authors to understand the content of the communications. This type of exploratory analysis of data and documents can play an important role in the work of an intelligence analyst (Gersh et al., 2006, Pirolli and Card, 2005).

Section snippets

Experiment I: spontaneous speech

In the first experiment, we compared visualization performance using transcriptions of unrehearsed telephone conversations. The types of visualization techniques were similar to those used in Butavicius and Lee (2007) including a Random List, a structured list, a 2D display based on the Isomap algorithm (Tenenbaum et al., 2000) and another 2D display based on multidimensional scaling (MDS: Shepard, 1980). However, as mentioned above, we used transcriptions of unrehearsed, telephone

Experiment II: Enron emails

Experiment II differs from Experiment I in two main ways. Firstly, the document set consists of emails from the Enron Corporation data set rather than transcriptions of spoken dialog. During the legal investigation of the Enron Corporation, the Federal Energy Regulatory Commission released a large collection of actual emails from the corporation, containing over 600,000 messages, from approximately 150 employees (Klimt and Yang, 2004). These emails not only contain messages relevant to the

Comparison between experiments

There was a consistent trend across the three visualizations assessed in the two experiments in this paper and in the experiment in Butavicius and Lee (2007). MDS outperformed the Ordered List, while the Ordered List was superior to the Random List. However, the performance advantage was expressed differently between the studies in terms of either speed or accuracy. Fig. 15 shows the relative performance of the three common visualizations in terms of speed and accuracy.

Fig. 16 demonstrates that

Conclusion

In two studies, we found that the 2D visualizations structured according to a cognitive representation of the underlying document similarities outperformed a 1D visualization of the same similarities when applied to unstructured texts. Both of these types of displays performed better than an unstructured list. These findings parallel those for visualizations of highly structured news articles (Butavicius and Lee, 2007). In the second experiment of this paper we also showed that the cognitive

Acknowledgments

We wish to thank Chlöe Mount, Joanne Spadavecchia and Andrew Brolese for conducting the experiments, Chris Jones for his work on the visualization interface and Ian Coat, Glen Smith and several anonymous reviewers for their assistance and helpful suggestions. Daniel Navarro was supported by an Australian Research Fellowship (ARC Grant DP-0773794).

References (65)

  • M.J. Brusco

    Measuring human performance on clustering problems: some potential objective criteria and experimental research opportunities

    Journal of Problem Solving

    (2007)
  • Clavier, S.M., El Ghaoui, L.M., 2008. Breaking world news: the computerized dynamic visualization of aggregate...
  • J. Cohen

    Statistical Power Analysis for the Behavioural Sciences

    (1988)
  • T.F. Cox et al.

    Multidimensional Scaling

    (1994)
  • T. Cribbin et al.

    Visual-spatial exploration of thematic spaces: a comparative study of three visualization models

    Electronic Imaging 2001: Visual Data Exploration and Analysis VIII

    (2001)
  • Cselle, G., Albrecht, K., Wattenhofer, R., 2007. Buzztrack: topic detection and tracking in email. In IUI'07:...
  • Don, A., Zheleva, E., Gregory, M., Tarkan, S., Auvil, L., Clement, T., Shneiderman, B., Plaisant, C., 2007. Discovering...
  • Donath, J., Karahalios, K., Viegas, F., 1999. Visualizing conversation. In: Nunamaker, J.F., Jr. (Ed.), Proceedings of...
  • Fabrikant, S.I., 2002. Spatial Metaphors for Browsing Large Data Archives. Ph.D. Thesis. University of...
  • S.I. Fabrikant et al.

    The distance-similarity metaphor in region-display spatializations

    IEEE Computer Graphics and Applications

    (2006)
  • S.I. Fabrikant et al.

    The distance-similarity metaphor in network-display spatializations

    Cartography and Geographic Information Science

    (2004)
  • B. Fortuna et al.

    Visualization of text document corpus

    Informatica

    (2006)
  • Frau, S., Roberts, J.C., Boukhelifa, N., 2005. Dynamic coordinated email visualization. In: Vacla Skala (Ed.),...
  • Frøkjær, E., Hertzums, M., Hornbæk, K., 2000. Measuring usability: are effectiveness, efficiency, and satisfaction...
  • J. Gersh et al.

    Supporting insight-based information exploration in intelligence analysis

    Communications of the ACM

    (2006)
  • J.J. Godfrey et al.

    SWITCHBOARD-1 Transcripts LDC93S7-T. CD-ROM

    (1997)
  • Görg, C., Stasko, J., 2008. Jigsaw: investigative analysis on text document collections through visualization. In:...
  • Granitzer, M., Kienreich, W., Sabol, V., Andrews, K., Klieber, W., 2004. Evaluating a system for interactive...
  • Gregory, M.L., Payne, D., McColgin, D., Cramer, N., Love, D., 2007. Visual analysis of weblog content. In: Proceedings...
  • T.L. Griffiths et al.

    Finding scientific topics

    Proceedings of the National Academy of Sciences

    (2004)
  • Heeman, P.A., Allen, J.F., 1997. Intonational boundaries, speech repairs, and discourse markers: modelling spoken...
  • Hornbæk, K., Frøkjær, E., 1999. Do thematic maps improve information retrieval? In: Sasse, A., Johnson, C. (Eds.),...
  • Cited by (7)

    View all citing articles on Scopus
    View full text