An assessment of email and spontaneous dialog visualizations
Highlights
► Two user studies assessed list and spatialized displays (structured and unstructured). ► The first study used emails and the second used transcriptions of spontaneous speech. ► Overall, spatialized displays outperformed list-based displays. ► Without content-based structuring, the spatialized display had no advantage over a list. ► Users could be very accurate with an unstructured list but only at the cost of speed.
Introduction
Document visualizations are graphical representations of a set of text documents. The aim of these visualizations is to convey trends and patterns that would be impossible, or very time consuming, to ascertain based on an examination of the individual documents alone. Visualization tools are particularly beneficial when the number of documents in the set to be analyzed is very large. As White, Muresan and Marchionini (2006) have pointed out, document visualization may be of benefit for exploratory data analysis when (1) the search problem is not well defined, (2) the user is not familiar with the problem domain and (3) when multiple points of view need to be considered in investigating the documents. For these reasons, document visualization tools have gained increasing popularity in not only intelligence gathering for security, defense and law enforcement (e.g., Stasko et al., 2008) but also for detecting trends in the domains of science, politics and public opinion (e.g., Clavier and El Ghaoui, 2008, Mothe et al., 2006, Powell, 2004).
A common approach to document visualization involves proximity-based techniques. A specific example of such an approach is a point-based spatialized display (also known as a spatialization) whereby each document is represented by an icon and the distance between the icons represents the similarity between the documents (e.g., ACQUAINTANCE and PARENTAGE: Liu et al., 2000; LEXIMANCER: Smith, 2000; TEXT GARDEN DOCUMENT ATLAS: Fortuna et al., 2006). That is, the icons for documents that are determined to be similar are positioned closer to each other in the display. This type of layout is consistent with Montello et al.'s (2003) ‘First Law of Cognitive Geography’ which states that “people believe closer things to be more similar than distant things” (p. 317). In two relatively large scale (N=45 and 48) user studies these authors demonstrated empirical evidence for this principle. The success of these displays possibly lies in the ability of the human visual system to detect easily patterns in the display such as clusters and outliers. As Brusco (2007) has pointed out, the ability to partition such point arrays into clusters is one of many visual combinatorial optimization problems for which the human visual system appears to be very well adapted (see also Vickers et al., 2001).
The reliance of spatialized displays on the capabilities and limitations of the human visual processing system, and the need to provide empirical evidence as to their real-world effectiveness, suggests that it is important to study user behavior. In particular, it seems important to study whether or how visualizations facilitate performance on the information-handling tasks they are designed to support. As a result there is a growing body of studies that have conducted empirical research into the effectiveness of visualizations (e.g., Butavicius and Lee, 2007, Cribbin and Chen, 2001, Don et al., 2007, Fabrikant et al., 2004, Fabrikant et al., 2006, Tory and Möller, 2004, Tory et al., 2007, Lee et al., 2003, Sanyal et al., 2009, Ware, 2000, Westerman et al., 2005, Westerman and Cribbin, 2000).
While the notion of visualizations such as spatialized displays may have strong intuitive appeal, the empirical support for a performance advantage over traditional list type displays is not unequivocal. Certainly, there is strong evidence that the performance associated with 2D visualizations is better (or at least no worse) than 3D visualizations (Fabrikant, 2002, Newby, 2002, Sebrechts et al., 1999, Westerman and Cribbin, 2000). In addition, Butavicius and Lee (2007) and Tory et al. (2007) have found better performance with spatialized displays when compared to list and landscape displays. However, Cribben and Chen's (2001) user study found better performance, at least for some tasks, for a list-based display when compared to a spatialized display and two network displays that also contained links between related documents. Similarly, Hornbæk and Frøkjær (1999) found that there was no difference in the number of documents retrieved or marked as relevant between text-only search and a visualization display known as a thematic map (essentially a spatialized display with the addition of “theme” words on the display) but that participants took longer when using the thematic map. Qualitative analysis of participants' verbal descriptions of their thoughts and actions in the study suggested that there was a tendency for the users to get ‘lost’ in the thematic map displays. Finally, Swan and Allan (1998) found only modest improvements in more sophisticated spatialized displays over text-only interfaces. As suggested by Newby (2002), it is still unclear whether “visual interfaces for IR can be more effective than text-based interfaces” (p. 50).
There could be two reasons for the variation in findings regarding list versus spatialized displays in the literature. Firstly, many of the user studies reported in the literature are based on small sample sizes. For example, Swan and Allan (1998), Cribben and Chen (2001) and Hornbæk and Frøkjær (1999) used studies of size 24, 15 and 6 respectively. While such smaller scale studies can be very informative, their results are harder to generalize and lack statistical power. Secondly, there is a great deal of variation between studies in the tools being tested, with many studies employing more complete visualization tools with different combinations of functionalities (e.g., INFOSKY: Granitzer et al., 2004; YAVI: Newby, 2002). As a result, the potential influence of a range of interface and functionality variables is not well controlled across the different experiments.
In this paper, we seek to address the first issue using comparatively large sample sizes (N=48 and 49) for our two experiments, and also by using a repeated-measures design to increase statistical power. We attempt to address the second issue by testing specific components of a visualization interface as opposed to a complete visualization tool. A final feature of our approach is that we use human document similarity judgments, rather than machine substitutes, in the construction of the displays. This is particularly important given the inconsistencies between human and machine judgments demonstrated in Lee et al. (2005). The use of human similarities allows us to focus on testing the visual component of the visualization in isolation from the quality of the underlying document similarities (for further discussion see Butavicius and Lee, 2007).
Our approach is similar to ‘de-featured’ systems (Morse and Lewis, 1997) and BASSTEP methodologies (Morse et al., 2002), in which only basic features are tested or introduced at each stage of the development process. Our approach differs from these methodologies in that we are using a controlled experimental framework for our testing. As Walenstein (2002) has pointed out, testing of the isolated components of software is necessary to comprehend “the abstract principles of the support provided by the tools rather than the interfering details of the particular prototype” (p. 39). This means the findings of this paper may inform the design of a wide range of visualization and information retrieval tools that employ spatialized and list-based displays.
Our study builds on one reported by Butavicius and Lee (2007), who evaluated the performance of 80 participants in an experiment using four different visualization techniques applied to news articles. The displays were a Random List, an Ordered List and two two-dimensional visualizations using the multidimensional scaling (MDS: Shepard, 1980) and Isomap (Tenenbaum et al., 2000) layout algorithms. All but the Random List display were constructed using human judgments of document similarity from Lee et al. (2005) to ensure that they were structured using a cognitive model of the document space. In the Butavicius and Lee (2007) study, participants performed best – in the sense that they were faster and accessed fewer documents – when using the structured displays and the two-dimensional (2D) spatialized displays outperformed the one-dimensional (1D) lists.
Our study extends Butavicius and Lee's (2007) paper in two main ways. Firstly, in the previous experiment, all the experimental conditions with a 2D layout were structured using algorithms operating on human judgements of document similarity. Therefore, it is not possible to rule out the hypothesis that the performance achieved in these conditions was simply due to the fact that the documents were laid out in the 2D plane. Westerman and Cribbin (2000) showed that increasing the semantic variance accounted for by 2D solutions in spatialized displays (to the order of 50%, 70% and 90%) was found to improve performance on a search task. However, it is not possible to determine from either Butavicius and Lee (2007) or Westerman and Cribbin's (2000) studies whether simply laying document representations out randomly in a 2D plane, without any structuring according to semantic information, may still be advantageous to the user (or conversely, whether such visualizations are indeed worse than unstructured list-based displays). For example, a random 2D spatialized display may allow a user to remember where documents are better than an unordered list of documents. To address this issue, we include a random 2D spatialized display condition in the second experiment in this paper. In so doing, we also address another issue in real world applications of visualization tools, concerned with how a visualization of document space will perform in cases where there is little semantic structure to be found (i.e., the documents are all from disparate topics). Many intelligence and exploratory applications of visualization tools, where the corpora of documents changes frequently, result in the semantic structuring of the space changing rapidly and unpredictably. In addition, distinct semantic structure may be less apparent in visualizations of email and spontaneous speech because, as we discuss shortly, the topicality in such texts can be varied both between and within the documents. It is therefore useful to determine whether visualization tools provide any advantage or disadvantage over conventional list-based displays in these ‘worst-case’ scenarios.
The second way in which the current paper builds on Butavicius and Lee (2007) is by examining email and transcriptions of telephone conversations. As with most user studies in visualizations, Butavicius and Lee (2007) used well-edited documents. Many previous assessments of visualizations have used similar documents in the form of news articles (e.g., Cribbin and Chen, 2001, Granitzer et al., 2004; Experiment II: Newby, 2002) and journal articles (e.g., Hornbæk and Frøkjær, 2003). These sorts of articles are also used extensively to test information retrieval tools in benchmark tests and competitions (Voorhees and Harman, 2005). However, it remains to be seen how well visualization techniques perform when faced with more spontaneous, less polished texts such as unrehearsed conversations and emails.
Spontaneous speech and email are similar to a range of newer communication media involving computer-to-computer interactions including web logs (colloquially known as “blogs”), Internet forums and instant messaging. These fora are increasing in popularity and represent a wealth of information that lends itself to exploration using visualization tools. All of these differ from professionally edited news articles in a number of ways including:
- a.
Linguistic features: particularly in spoken dialog, the presence of features such as speech repairs (Levelt, 1983) and discourse markers (Shiffrin, 1987) can make interpretation of such language difficult (Heeman and Allen, 1997).
- b.
Vocabulary: more conversational or informal communications often feature the use of slang and more fluid language use including specialized vocabulary, emoticons, acronyms and abbreviations.
- c.
Information density: these documents are often characterized by their “loose, unstructured, garrulous or unedited quality” and may be considered to be ‘information poor’ in comparison to documents that are engineered by communication experts (Toffler, 1970, p. 155). In contrast, engineered documents such as articles, scripts, formal speeches are “highly purposive … [and] pre-processed to eliminate unnecessary repetition” (Toffler, 1970, p. 155).
- d.
Breadth of topicality: rather than being focused on a particular topic, these less-structured documents can cover a range of different topics.
These characteristics can make such communications difficult to analyze for both humans (Hornbæk and Frøkjær, 2003, Ratté et al., 2007) and computers.
Given the dialogic character of these media, many tools for visualizing such archives have centered around presenting and analyzing patterns in the metadata, e.g., the sender, recipient and time/date stamp information associated with an email (e.g., MAILVIEW: Frau et al., 2005) or the author information and thread in newsgroups and web forums (e.g., CONVERSATIONAL LANDSCAPE and LOOM: Donath et al., 1999). Subjective assessments of such visualizations, when used to display hierarchical, correlational and temporal patterns in email archives, have been favorable (Perer et al., 2006, Perer and Smith, 2006). There have also been efforts to represent the content of such communications as well. There are tools documented in the literature containing spatialized displays for visualizing the entities (i.e., people, places dates and organizations) within an email corpus (JIGSAW: Görg and Stasko, 2008), author's mood (e.g., CONVERSATIONAL LANDSCAPE and LOOM: Donath et al., 1999) as well as content similarities between individual messages via spatializations for blogs (INSPIRE: Gregory et al., 2007; VIZBLOG: Pérez-Quiñones et al., 2007) and topicality clustering of emails (BUZZTRACK: Cselle et al., 2007). However, despite this increased interest to date we are not aware of any empirical study that has tested the performance of visualizations of such communications.
In this paper, we present two experiments that examine how well several proximity-based visualization techniques assist a user in the analysis of spontaneous speech transcripts and email texts. In these displays, we are interested in representing the content of a collection of texts across a number of individuals. This type of display could be of use in a task of an analyst who, for the purposes of business, political or security intelligence gathering, is exploring a corpus of unstructured, spontaneous texts from multiple authors to understand the content of the communications. This type of exploratory analysis of data and documents can play an important role in the work of an intelligence analyst (Gersh et al., 2006, Pirolli and Card, 2005).
Section snippets
Experiment I: spontaneous speech
In the first experiment, we compared visualization performance using transcriptions of unrehearsed telephone conversations. The types of visualization techniques were similar to those used in Butavicius and Lee (2007) including a Random List, a structured list, a 2D display based on the Isomap algorithm (Tenenbaum et al., 2000) and another 2D display based on multidimensional scaling (MDS: Shepard, 1980). However, as mentioned above, we used transcriptions of unrehearsed, telephone
Experiment II: Enron emails
Experiment II differs from Experiment I in two main ways. Firstly, the document set consists of emails from the Enron Corporation data set rather than transcriptions of spoken dialog. During the legal investigation of the Enron Corporation, the Federal Energy Regulatory Commission released a large collection of actual emails from the corporation, containing over 600,000 messages, from approximately 150 employees (Klimt and Yang, 2004). These emails not only contain messages relevant to the
Comparison between experiments
There was a consistent trend across the three visualizations assessed in the two experiments in this paper and in the experiment in Butavicius and Lee (2007). MDS outperformed the Ordered List, while the Ordered List was superior to the Random List. However, the performance advantage was expressed differently between the studies in terms of either speed or accuracy. Fig. 15 shows the relative performance of the three common visualizations in terms of speed and accuracy.
Fig. 16 demonstrates that
Conclusion
In two studies, we found that the 2D visualizations structured according to a cognitive representation of the underlying document similarities outperformed a 1D visualization of the same similarities when applied to unstructured texts. Both of these types of displays performed better than an unstructured list. These findings parallel those for visualizations of highly structured news articles (Butavicius and Lee, 2007). In the second experiment of this paper we also showed that the cognitive
Acknowledgments
We wish to thank Chlöe Mount, Joanne Spadavecchia and Andrew Brolese for conducting the experiments, Chris Jones for his work on the visualization interface and Ian Coat, Glen Smith and several anonymous reviewers for their assistance and helpful suggestions. Daniel Navarro was supported by an Australian Research Fellowship (ARC Grant DP-0773794).
References (65)
- et al.
An empirical evaluation of four data visualization techniques for displaying short news text similarities
International Journal of Human-Computer Studies
(2007) - et al.
Visualizations of binary data: a comparative evaluation
International Journal of Human–Computer Studies
(2003) - et al.
Avoiding the dangers of averaging across subjects when using multidimensional scaling
Journal of Mathematical Psychology
(2003) Monitoring and self-repair in speech
Cognition
(1983)- et al.
Combining mining and visualization tools to discover the geographic structure of a domain
Computers, Environment and Urban Systems
(2006) - et al.
Browsing a document collection represented in two- and three-dimensional virtual information space
International Journal of Human–Computer Studies
(2005) - et al.
Mapping semantic information in virtual space: dimensions, variance, and individual differences
International Journal of Human–Computer Studies
(2000) - et al.
Using clustering and classification approaches in interactive retrieval
Information Processing and Management
(2001) - et al.
On the dangers of averaging across subjects when using multidimensional scaling or the similarity-choice model
Psychological Science
(1994) - Basalaj, W., 2000. Proximity Visualization of Abstract Data. Unpublished Doctoral Dissertation. University of Cambridge...
Measuring human performance on clustering problems: some potential objective criteria and experimental research opportunities
Journal of Problem Solving
Statistical Power Analysis for the Behavioural Sciences
Multidimensional Scaling
Visual-spatial exploration of thematic spaces: a comparative study of three visualization models
Electronic Imaging 2001: Visual Data Exploration and Analysis VIII
The distance-similarity metaphor in region-display spatializations
IEEE Computer Graphics and Applications
The distance-similarity metaphor in network-display spatializations
Cartography and Geographic Information Science
Visualization of text document corpus
Informatica
Supporting insight-based information exploration in intelligence analysis
Communications of the ACM
SWITCHBOARD-1 Transcripts LDC93S7-T. CD-ROM
Finding scientific topics
Proceedings of the National Academy of Sciences
Cited by (7)
Whether the Pairwise Rating Method and the Spatial Arrangement Method yield comparable dimensionalities depends on the dimensionality choice procedure
2021, Methods in PsychologyCitation Excerpt :Several alternative (dis)similarity representations in terms of features have been proposed (e.g., hierarchical clustering, Johnson, 1967; additive trees, Sattath and Tversky, 1977; additive clustering, Shepard and Arabie, 1979; extended trees, Corter and Tversky, 1986), but they have been rarely applied (see Johnson and Tversky, 1983; Malt, 1994; and Tenenbaum, 1996, for a few notable exceptions). The amenity of spatial representations not only shows in the paucity of its alternatives, but also when different visual representations are explicitly compared and spatial representations are found to yield superior information retrieval (e.g., Butavicius and Lee, 2007; Butavicius et al., 2012). This “competitive” advantage of spatial representations probably stems from the spatial nature of our environment and our familiarity with maps representing it (Hout et al., 2013; Tolman, 1948).
Towards privacy-aware exploration of archived personal emails
2024, International Journal on Digital LibrariesA Systematic Review on Dyadic Conversation Visualizations
2021, ICMI 2021 Companion - Companion Publication of the 2021 International Conference on Multimodal InteractionUsing Semantic Context to Rank the Results of Keyword Search
2019, International Journal of Human-Computer Interaction