Visualization of information : a proposal to improve the search and access to digital resources in repositories

Presently, the most notable challenges associated with repositories in resolving problems of searches of digital resources, lie in providing an understanding of resource classification according to a knowledge representation scheme and the relationship between them. However, one of the areas with very little research in the field is the study of visual search interfaces that provide access to relevant materials in digital repositories; more specifically, in the access of digital resources according to areas and sub-areas of a particular knowledge domain through a taxonomy classification. In this article, we focus in proposing a best practice for the search and access to relevant digital resources in repositories through visualization techniques. The article presents a prototype tool as one possible approach to facilitate searches and access to digital resources. Finally, we present the conclusions and future work in the field in order to improve access to relevant materials in digital repositories.


Introduction 123
One of the greatest challenges that face the specialized data repositories is how best to facilitate access to the digital resources.There are studies (Marchionini, 2006;White & Roth, 2009) that focused specifically on strategies for searches and browsing capabilities to locate digital resources on this type of interfaces, demonstrated that conventional search and exploration strategies are not sufficiently robust and flexible to facilitate access and location of a col-lection of digital resources.In terms of efficiency, these results exhibited a loss of all semantic capacity available through some specialized repositories by defining metadata, although these requirements are the key to enrich the processes of resources, and finally, iv) identify effective interfaces to performing browsing and searching processes over digital resources.
The motivation of this article is to analyze if through visualization information, we can help users improve access to a collection of digital resources hosted by repositories.To carry out this study, we developed a tool based on four visualization techniques (tree, radial, category, hyperbolic) (Herman, Melancon, & Marshall, 2000).In order to obtain real results, we took a sample collection of (42.800) digital resources extracted from Europeana digital library, such as concepts and topics of art, culture, and heritage defined by Art and Architecture Thesaurus (AAT), as a knowledge representation scheme.Finally, we present the results of a usability study in order to evaluate usefulness and efficiency of visual search interfaces based on a hierarchical navigation structure.The total of participants were ( 16) with basic knowledge of web searching.
The main purpose of the study focused on evaluating the use of information visualization related to i) the relevance of hierarchical classification of all visualization techniques applied in order to search and access a collection of digital resource (visual perception), ii) the effect of visual interfaces to explore digital resources (effectiveness), and finally iii) the effect of aesthetic design of visual interfaces to carry out the access of digital resources (usefulness).
The second section of this paper introduces a theoretical background, previous efforts in the field, and related investigations.Section 3 explains the methodology used to carry out the development of visual tool interfaces, followed by usability test.Section 4 presents the results of usability study through user's interaction.Finally, we present an analysis of evaluation with the final conclusions of the study.

Background
In recent years, digital repositories have had a representative impact on their technological development due to exponential increase in the number of digital resources published (Margaryan & Littlejohn, 2008).This growth of digital resources has led to the development of several strategies in different areas: i) (in terms of technology) the development of distributed repositories, heterogeneous repos-itories and federations of repositories as central access points to each of them (McGreal, 2008), ii) (in terms of semantics) the use of linking knowledge classification schemes by using ontologies and thesauri to provide a better understanding and organization of digital resources, and finally iii) (in terms of access) the design of strategies to offer metadata description.This latter strategy turns out to be an essential condition in order to search relevant results within a digital repository (Cechinel, et al., 2009;Park, 2009).

Related research
Empirical studies have been conducted to show that deficiencies in the design of interfaces can interrupt user activity throughout in-formation search processes (Tsakonas & Papatheodorou, 2008).One of the first proposals to improve access to learning objects using visual strategies was conducted by Card (Card, Mackinlay, & Shneiderman, 1999).In this study, users were assisted by various zooming techniques, and so they were able to approach the most relevant learning objects restricted by additional search criteria applied to the number of remaining objects.Kim (Kim, 2006) con-ducted another study to measure the level of user satisfaction through the use of two digital libraries and an e-print repository.The findings revealed serious design-related interfaces problems, particularly when defining input formats in queries and search results for display in the browser.
There are a few studies concerned with the interaction of users through information visualization in open access digital libraries.Tsakonas (2008) conducted a study to determine if the content and system features significantly affect the usefulness and usability levels of an Open Access (OA) repository.As a result, the study found that attributes related to the level of relevance of information and learning facilities directly affect the ability of interaction and user satisfaction.MACE project (Stefaner et al., 2007) is a European project related with the search of digital resources associated with architecture.The results of the field search suggest that principles of navigation with multiple facets facilitates immersion processes of search through collaborative tagging (Stefaner et al., 2009).

Information Visualization
Information visualization is a visual representation of complex information using appropriate graphical spaces and structures in order to facilitate rapid assimilation and comprehension.Its development has marked a strong line of research disciplines making headway on major trajectories in the field of computer graphics.This is the case of Human Computer Interaction HCI, a discipline in constant growth based on user requirements and execution times in order to avoid mistakes, and thereby provide product satisfaction to users (Marchionini, 1997;Shneiderman & Ben, 1998).
Information visualization has several areas of study, which have been addressed from the graphical user perception through research to determine the influences linked to visual variables such as position, length, color and shape, impacting on the effectiveness of data visualization (Cleveland, 1984;Simkin & Hastie, 1987;Vesey & Galletta, 1991).There have been studies of visual perception models, where some principles are raised from the point of view of computer graphics (Branch & Olague, 2001).Other studies to work with advanced techniques such as eye tracking visual analysis, strate-gies for evaluating product designs in the Web (Heer & Bostock, 2010;Simkin & Hastie, 1987), and a diverse list of applications and research oriented production lines (von Landesberger et al., 2011).

Visualization techniques
There are several types of techniques used in the visualization of data that are more appropriate for certain graphical structures (Herman, et al., 2000).There are different types of techniques used in data visualization, but for our purposes of study, we performed a selection of techniques according to: i) graphical representations structures (Schulz, Hadlak, & Schumann, 2011), ii) thematic clustering strategies (Shneiderman, Feldman, Rose, & Grau, 2000), and finally iii) through hierarchical representations of categories (Khoo, Kusunoki, & MacDonald, 2012;Tenopir, 2003).
Based on these selection criteria, we adapted and integrated four interfaces based on four information visualization techniques.Next, we describe each of them.

Tree visualization type
It is a classic type of visualization implemented to locate resources hierarchically using different levels (Fig. 1).For this strategy we implement a navigational structure (Plaisant, Grosjean, & Bederson, 2002) in order to explore concepts trough hierarchical structure.

Radial visualization type
Is a circular visualization technique that permits the use of graphical components in order to represent nodes (Fig. 2).This technique uses links that identify a navigation structure according to a previously defined classification (Eades, 1990).

Hyperbolic visualization type
These are radial type structures, whose differences lie in the use of focus and context techniques based on hyperbolic geometry for visualizing and manipulating large hierarchies (Lamping & Rao, 1996) (Fig. 3).

Category visualization type
Also known as folder navigation, this type of visualization is appropriate for handling hierarchies and classifications (Fig. 4).The main purpose of this visualization technique is to solve problems of semantic interoperability (Noik, 1993).

Methodology
For our study, we defined three phases.In the first phase we selected a set of AAT thesaurus terms in order to define concepts of navigation structure.Then, we use Europeana API (Europeana, 2013) to carry out the connection of metadata and linking digital resources by external content providers.The second phase, the interfaces were integrated according to four visualization techniques (tree, radial, hyperbolic and category).To carry out the development of interfaces, we defined navigation structure according to knowledge representation scheme of AAT thesaurus.Finally, the third phase we proposed a usability study to make quantitative and qualitative data, to assess effectiveness and usefulness at level of accessibility interfaces and use of navigation structures.
The relationship between utility and usability are mutually dependent, as exposed by Dillon (Dillon & Morris, 1999): "Usability represents the degree to which the user can exploit the utility."The purpose of usability analysis is to reduce user frustration when it comes to performing tasks (Norman, 2005).
For our case study, as a reference, we took the search of relevant digital resources according to taxonomic classification and specific knowledge area of art, culture and European heritage by AAT thesaurus.

Visual structures
The interfaces were developed on an Open-Source-API called Infovis (http://philogb.github.io/jit/).This was achieved by using Action Script to evaluate the hierarchical structure based on the principles of well formed knowledge representation schemes.The implementation of the taxonomic structure for the visual representation was performed by using the load support JavaScript data format JSON (Java Script Object Notation) (Crockford, 2006).We evaluated four visualization techniques for the design and integration of visual interfaces.Each interface was loaded with the same taxonomic structure of terms related to the topic of art, culture and European heritage by the AAT thesaurus.These terms were connected with the number of digital resources explored in Europeana.By clicking on a node, users can display a representation of each term given by the thesaurus.Users can view the classification of thematic areas, the number of resources associated with each term, and digital resources listed through a in a paging system classified by language, content provider, format and copyright.


Client tier, presents the requests of users and the selection of visual interface to search digital resources according to a specific knowledge area of AAT terms related to "styles and periods".
Fig. 7 presents an example of a visualization tool with the representation of radial interface

Results of usability study
In this section we present the results of the usability study.Accord-ing to the objectives of our analysis and to the recommendations defined for usability studies (Nielsen, 1994a(Nielsen, , 1994b)), 16 participants were selected for the tests.All participants were middle-aged, with a good fluency in handling Web applications by Internet searches.
Given the nature of the study, it was not necessary that users exhib-it extremely specific knowledge at the level of taxonomic struc-tures.Instead, careful consideration was given to their knowledge of search methods and interfaces, which along with the other aforementioned data were collected in a questionnaire.The partici-pant distribution was as follows: 4 researchers (25%), 4 graduate students (25%), 4 undergraduate students (25%) and 4 high school students (25%), for a total of 16 participants.
There are specific proposals for the principles of usability focused on level of consistency (Shneiderman & Ben, 1998), heuristics factor (Nielsen, 1994b), error handling and recovery (Polson & Lewis, 1990), among others.As a proof-of-concept, we carried out an analysis based on how easily users can search digital resources through the use of visual search interfaces; for that reason, we used digital resources based on European heritage, through the use of Europeana API as a case study.The main goal is to see if through these visualizations, users can locate digital resources in a more interactive way.To carry out this study, we analyzed all interfaces, through descriptive statistics and the ANOVA test to show the relevance of attributes of usability related with the following hy-pothesis: "The hierarchical classification of a graphical interface will have a positive effect on access to a collection of digital resources".

Contrasting hypothesis
To verify our hypothesis, we performed a Pearson correlation analysis to identify if the taxonomy of these interfaces, affects access to a collection of digital resources (Table 1).According to the Pearson correlation, R is an index that measures the linear relationship between two random quantitative variables.The maximum is 1 and a minimum of 0.6 is considered acceptable.
As R is increased (R = 0.695) is considered an acceptable result, but not ideal.In the same way, (R 2 = 0.46) and less than p-value (p-value <0.05) are ideal.These results indicate that the null hypothesis is rejected and thereby, it is accepted that there is a relationship between taxonomy and accuracy.
Therefore, we check that the hierarchical classification attribute, affects the access to a collection of digital resources, but in a considerable way.Being the category interface, the highest degree of effectiveness (mean = 4.38 SD = 0,744).

The effect of hierarchical classification
This attribute refers to the evaluation of taxonomic structure for each interface, that is the classification structure of a navigational search at the graphical level that user perceived, according to his experience with the interface selected.
In general, according to the results of visual perception the hierarchical classification of all interfaces presented a good level of satisfaction for all users with (53.1%) good ratings, followed by (18,8%) regular, (12,5%) very low, (9,4%) excellent, and (6,3%) low ratings.However, in order to identify the ease of use of the visual search interface according to hierarchical classification, users have preference for tree and category interfaces.Table 2 presents an analysis of the mean and standard deviation, to analyze the perception of interfaces evaluated at the level of hierarchical classification.At the hierarchical classification level, we can identify that the tree interface has a high average with respect to other interfaces (mean = 4.11 SD = 0.609).

The effect of effectiveness
Related to this aspect, we evaluated the effectiveness with which users found the concept or topic of his election in order to search digital resources.Table 3 presents the results associated with this evaluation criterion.
In this case, the interface that demonstrates better effectiveness in order to locate concepts and topics in the navigational structure was the category interface with (mean = 4.375 and SD = 0.517) and the tree interface with (mean = 4.00 and SD = 0.7011).Finally, Fig. 8 presents the average time required by each user in order to search a term or topic in the navigational structure.The time average of each user shows tree and category interfaces as the better interfaces for effectively locating terms within a navigation structure.

The effect of usefulness
In Table 4 we present results of usefulness associated with ease of use of interface to carry out the search process according to navigation structures defined by AAT thesaurus classification.As a result, still the visual interfaces category (mean = 4.11 SD = 0.609) and tree (mean = 4.11 SD = 0.609), were the best ratings in comparison to other interfaces.

Conclusions
Based on results from other research that applied strategies of taxonomic structure in a graphic visualization (Graham, Kennedy, & Benyon, 2000).The users' preference for the visual representation of information on-screen, strongly reflected their own mental model of the information rather than the actual underlying structure of the information.However, for our case study this condition didn't apply for all techniques of visualization development.
We found that without good definitions of usability in the interfaces, for example (definition of events to mark the navigation paths), the user easily leaves the navigation process that he or she is performing (hyperbolic).Yet, interfaces that have good definitions of these terms of usability (radial, tree, category), have proven to be of great advantage for the location of resources within the navigation scheme, and therefore have allowed us to know the hierarchical classification structure to continue with the exploration process.
The size of the graph is a typical problem in data visualization.Few techniques can be effectively treated with thousands of nodes, although the application in this order of magnitude are in a wide variety of applications and display technique combinations (Blanch & Lecolinet, 2007;Muelder & Ma, 2008) that can address the data accurately.In this case, at the level of taxonomy, the deeper the level of hierarchy, the lower the access to digital resources.
In conducting a correlation analysis, we found that the hypothesis proposed is not discarded, because all p-values are less than 0,05 and the correlation coefficient was in all cases, greater than 0.5.The Pearson correlation in the hypothesis obtained was acceptable (R = 0.695), which indicates that the attributes of taxonomy could be a key factor to improve access to digital resources.However, it is necessary that participants understand the navigation structure in order to facilitate the search for resources on a hierarchical structure within a specific knowledge area.For this case, it is important to define a good strategy of navigation, in order to explore the hierarchical classification categories and locate resources according to this classification.
Future work should focus on integrating components to perform additional searches defined by metadata that allow the connection of the most relevant digital resources.In this direction, is important to identify the community of users and knowledge area of digital resources, in order to define a knowledge representation scheme like an ontology or thesaurus, to implement a navigational search structure.

Figure
Figure 1.Tree interface

Figure
Figure 2. Radial interface

Figure
Figure 4. Category interface

Fig. 5
Fig. 5 presents the work model for the data transformation process to design taxonomical structures of navigation and connection with the API of Europeana in order to obtain digital resources related with the topics of knowledge representation schemes defined by AAT thesaurus.

Figure 5 .
Figure 5. Process of analysis and transformation of data sets

Fig. 6
Fig. 6 presents the structure of the extensible architecture for visualizing collections of digital resources.

Figure 6 .
Figure 6.Process of analysis and transformation of data sets AAT  Data store tier, presents the collection of digital resources extracted from Europeana, and terms associated to knowledge area of AAT thesaurus. Application tier, presents the libraries of visualization techniques applied; requests for user-level queries, related keywords, an area of knowledge, or selection of interfaces for displaying visualization interface.

Figure 7 .
Figure 7. Radial visualization Fig. 7 presents: (1) the mechanism to autocomplete textual search through AJAX component.; the terms represented a topic of art, culture and European heritage according to terms of AAT thesaurus, (2) the central area shows the area of navigational search through structure of taxonomy representation by AAT and (3) at the top left corner of the image, the figure presents different interfaces (visualization techniques) that the user can select in order to search digital resources.Finally, users obtain the resources related to the topic selected, and the software tool shows information associated with the title, description, type, content provider and language of resources found in the results information page.