Distracting users as per their knowledge: Combining linked open data and word embeddings to enhance history learning

https://doi.org/10.1016/j.eswa.2019.113051Get rights and content

Highlights

  • Fight information overload by semantics-driven filtering and knowledge generation.

  • A new way of learning History based on natural language processing and Linked Data.

  • Customized collecting of texts to train neural networks and check user knowledge.

  • Find and assess relevant interconnections among semantic entities by word vectors.

Abstract

Organizations that preserve and promote heritage must meet the expectatives of sophisticated visitors who, far from wanting simply to be informed, desire to explore engaging and innovative technology-driven experiences which consider their particular interests and encourage them to discover more. We describe an approach based on quiz games that can be exploited in the deployment of such challenging experiences. The game consists of raising multiple-choice questions about a particular theme which is introduced by a Humanities expert through a brief narrative. Given the input text, a question and its right answer, our strategy provides the expert with a set of wrong alternatives (called distractors). These options are chosen from a (semi)automatically-built tailor-made corpus of documents by considering each player’s level of knowledge on the game theme and exploiting Linked Open Data initiatives and natural language processing. On the one hand, automatic selection of distractors assists the Humanities expert to create games about very diverse topics without needing to be a specialist in all of them. On the other one, distractors are related to the right answer of each question in an appealing and meaningful way, which contributes to arouse the visitors’ curiosity and their possible interest in exploring similar experiences in future visits. The work has been experimentally validated, achieving better results than a previous distractor identification strategy.

Introduction

The current impact of technology in users’ lifestyle and the universal access to information through the Internet pose major challenges for cultural institutions which preserve and promote heritage, thus urging a change in the kind of experiences offered to target audiences. Attracting the users’ attention requires these institutions must be aware of their particular interests, preferences, needs and expectatives, since, according to Gheorghilas, Dumbraveanu, Tudoricu, and Craciun (2017), current visitors are more and more “sophisticated” and want to be “provoked” (more than informed) by means of innovative and stimulating experiences. In this context, the goal is to slide from the simple provision of information (which is already available in the Web) to enhanced experiences based on interpretation of historical facts driven by Humanities experts, open debates and active interaction with like-minded visitors through edutainment approaches (van Aalst & Boogaarts, 2002). In other words, it is necessary to switch from a simplistic and local exposition of historic events, characters and related artworks that often disregards the existence of interesting connections among all of them to “immersive and engaging experiences where audiences can explore and discover stories about the world around them, past and present” (Carretero, Lopez, Jacott, 1997, Gheorghilas, Dumbraveanu, Tudoricu, Craciun, 2017, Seixas, 1994, Spoehr, Spoehr, 1994).

The provision of this kind of challenging experiences aimed at improving History learning is the main research problem we attempt to solve. Our first contributions in this field fall into the scope of the Horizon H2020 project Crosscult1, where the authors have developed a pilot application that allows a Humanities expert to create edutainment experiences driven by narratives (texts) about different topics and quiz games for the target audience (see details in Daif et al., 2019). The expert is assisted in this task by a user-friendly tool that discovers associations among historical facts and concepts that are meaningful in the theme of the created experience, resorting to a central repository of knowledge that has been populated by recovering resources from Linked Data initiatives (mainly DBpedia2 and Europeana3).

Along with the nodes that represent concepts in each association, our quiz games incorporate the so-called question nodes. In particular, each question node presents a question about the topics of the expert-provided narrative, along with a set of choices that include the right answer (which is also identified by the expert) and other wrong alternatives called distractors. Arousing the users’ enthusiasm about our quiz games and inspiring them to wish to discover more requires the distractors to be chosen carefully. First, distractors cannot be clearly-false too-trivial options, as this would cause the user to lose interest in the game. Second, distractors must be related to the right answer in a meaningful and ingenious manner, in such a way that the quiz promotes the reflection of the users and besides attracts their attention. Third, distractors must be selected considering the particular level of knowledge of each user on the theme of the expert-created experience, thus avoiding that the responses selected for a knowledgeable visitor discourage other not-so-knowledgeable users (who will probably fail most questions if we use the same distractors for both profiles of players). Lastly but not least, our distractors should enable to deal with a huge diversity of themes with the goal of attracting a wide spectrum of potential visitors with very different interests.

Meeting above requirements, and especially attending to the diversity of visitor-oriented experiences currently demanded, requires high levels of customization that would be hard to manually manage by the organization’s staff. For that reason, we contribute with a proposal that first builds a tailor-made corpus by retrieving (from Wikipedia) relevant documents about the theme of the experience. Next, we automatically select (from those texts) a set of meaningful distractors for each question, according to the particular level of knowledge of each user on the corresponding topic. For that purpose, we harness public structured knowledge repositories bound to Linked Open Data initiatives (where a huge amount of concepts about History are linked and semantically described), Natural Language Processing (NLP) techniques, and word embeddings of the Word2Vec tool, where the knowledge discovery is supported by a neural network that is trained over the customized corpus (Mikolov, Chen, Corrado, & Dean, 2013).

Our approach brings significant benefits for cultural organizations, which will not foreseeably have specialized staff capable of manually creating valuable games on a wide variety of topics (especially if they are small and medium-sized institutions). Also, our automatic selection of distractors permit to deploy custom-tailor games, enabling to deal with recurring changes in both the users’ preferences and the topics posed in highly-dynamic experiences. This would enable, for instance, to select different distractors for a given question depending on the personal profile (i.e., interests and knowledge on the game theme) of the visitors who attend the venue throughout the day. Obviously, the provision of such visitor-oriented tailor-made games would be unfeasible by a Humanities expert in the abscence of automatic assistance.

The selected distractors are finally suggested to the expert so that s/he can integrate them during the creation of the experience. Our quiz games are aligned with the challenges described at the beginning of this section, since the distractors can be exploited, for instance, in the deployment of competitions among visitors who are located in different venues, or in open debates (physically or remotely) driven by experts who help interpret interesting historic facts that have been interconnected during the development of the quiz game. Both experiences boost learning while visitors have fun and promote their reflection and knowledge retention, thus gaining widespread appeal (Daif et al., 2019). In additional, the expert-provided narrative might also link historic topics to annual commemorations (e.g., international day of women, abolition of slavery, world television day...) in order to shape engaging quiz games which can inspect appealing connections between past and present on the subject commemorated on that date. For instance, distractors could find connections between people who have been persecuted throughout History for their sexual condition or religious beliefs, in the context of ad-hoc experiences created, for example in a museum, on the occasion of the international day of sexual and religious freedom. We argue that this kind of experiences contribute to trigger the visitors’ curiosity and awake their interest in coming back the venue to discover more.

This paper is organized as follows. Section 2 details the NLP technologies and Linked Data initiatives adopted in related works, highlighting the main functionalities we exploit (and even adapt) in our approach. Section 3 focuses on the procedure for collecting the documents of our tailored historical corpus. The algorithmic internals of our distractor identification strategy are detailed in Section 4. Next, Section 5 focuses on the experimental validation, where our new strategy is compared to the former approach to distractor selection we explored in Daif et al. (2019). Lastly, Section 6 concludes the paper and points out some lines of further research.

Section snippets

Technologies involved in our framework

As mentioned, the goal of this research is to identify appropriate distractors, given a question and its right answer. A distractor can be defined as an entity of the same nature as the answer, both of them sharing some features that could lead to mistakes when selecting the answer. That is, distractors are used to check how solid is the knowledge of the user about a theme.

These distractors will be extracted from a repository of information about the theme of the expert-proposed game. Even

Gathering a tailor-made corpus for each experience

In this section we describe how we collect the documents of the tailor-made corpus required to train the Word2Vec neural network, which uncovers the (cooccurrences-based) connections exploited in our distractor identification strategy (details will be given in Section 4). This process is conducted by the following guidelines to add or not a new document to the corpus:

  • First, the documents must be about the topics of the experience, which is contextualized by the expert-created brief

Computing distractors

We are now armored with a set of interesting tools to process the information in an intelligent way (Section 2) and we have access to a big dataset to browse and play with (Section 3). We are ready to describe the algorithm that suggests distractors to the expert for him/her to complete the details of the quizzes for checking the knowledge of the user about the theme of the experience. In this algorithm, we will get through three clear and differentiated phases: an initial one, supported by the

Experimental evaluation

The experiments we have carried out pursued a two-fold goal:

  • On the one hand, we wanted to verify the effectiveness of the refinements developed in our new distractor identification strategy, compared to the preliminary approach we explored in Daif et al. (2019).

  • On the other one, we needed to confirm the effect of the each player’s particular expertise on the automatic selection of customized distractors.

Conclusions and further work

From the technical point of view, the importance of our work has to do with the ability to discover new knowledge from the raw text sources available in Wikipedia, by exploiting synergies between the semantically-interlinked structured data from LOD repositories (mainly, DBpedia, YAGO and Wikidata) and NLP techniques (specifically, Named Entity Recognition and word embeddings). In particular, NER tools allow to identify DBpedia named entities in Wikipedia articles and to use their semantic

CRediT authorship contribution statement

Yolanda Blanco-Fernández: Data curation, Formal analysis, Investigation, Writing - original draft, Writing - review & editing. Alberto Gil-Solla: Data curation, Formal analysis, Investigation, Writing - original draft, Writing - review & editing. José J. Pazos-Arias: Data curation, Formal analysis, Investigation, Writing - original draft, Writing - review & editing. Manuel Ramos-Cabrer: Data curation, Formal analysis, Investigation, Writing - original draft, Writing - review & editing. Abdullah

Declaration of Competing Interest

The authors declare that there is no conflict of interest.

Acknowledgements

This work has been supported by the European Regional Development Fund (ERDF) through the Ministerio de Economía, Industria y Competitividad (Gobierno de España) research project TIN2017-87604-R, and through the Galician Regional Government under the agreement for funding the AtlantTIC Research Center for Information and Communication Technologies its Program for the Consolidation and Structuring of Competitive Research Groups.

References (39)

  • R. Collabert et al.

    A unified architecture for natural language processing: deep neural networks with multitask learning

    Proceedings of the 25th international conference on machine learning (ICML)

    (2008)
  • R. Collabert et al.

    Natural language processing (almost) from scratch

    Journal of Machine Learning Research

    (2011)
  • J. Daiber et al.

    Improving efficiency and accuracy in multilingual entity extraction

    Proceedings of the 9th international conference on semantic systems (i-semantics)

    (2013)
  • A. Daif et al.

    A mobile app to learn about cultural and historical associations in a closed loop with humanities experts

    Applied Sciences

    (2019)
  • K. Duan et al.

    Multi-category classification by soft-max combination of binary classifiers

    Proceedings of 4th international conference on multiple classifier systems

    (2003)
  • V.P. Dwivedi et al.

    Beyond word2vec: Embedding words and phrases in same vector space

    Proceedings of international conference on natural language processing (ICON)

    (2017)
  • A. Gheorghilas et al.

    The challenges of the 21st-century musuems: Dealing with sophisticated visitors in a sophisticated world

    International Journal of Scientific Management and Tourism

    (2017)
  • R. Glauber et al.

    A systematic mapping study on open information extraction

    Expert Systems With Applications

    (2018)
  • S. Gupta et al.

    Spied: Stanford pattern based information extraction and diagnostics

    Proceedings of workshop on interactive language learning, visualization and interfaces

    (2014)
  • Cited by (7)

    • A simple and fast method for Named Entity context extraction from patents

      2021, Expert Systems with Applications
      Citation Excerpt :

      Unfortunately, it is not as easy to create annotated training sets for technical documents and entities, mainly for three reasons. First, domain specific entities are costly to tag, because they are rare with respect to more generic words (Chiarello, Fantoni, Bonaccorsi, et al., 2017); second, manually tagged data-set of technical documents have a big business value, thus they are not open-sourced by researchers or companies (Blanco-Fernández et al., 2020); third, manual tagging for entity specific tasks requires the time of experts in the chosen domain, and experts are known to have limited time (Chiarello, Trivelli, Bonaccorsi and Fantoni, 2018). For these reasons, in the present paper we aim at using the state-of-the-art in Natural Language Processing in order to create a domain specific (in terms of entity and document type) NER system, without relying on manual annotated data.

    • SkillNER: Mining and mapping soft skills from any text

      2021, Expert Systems with Applications
      Citation Excerpt :

      NER is a computational linguistic method capable of extracting and classifying named entities mentioned in unstructured text into predefined categories (such as person names, locations, and product names). Assigning a word to a semantic class provides crucial information for tasks such as question answering (Abujabal et al., 2018; Blanco-Fernández et al., 2020), topic disambiguation (Fernández et al., 2012) or detection (Krasnashchok & Jouili, 2018; Lo et al., 2017; Al-Nabki et al., 2019), and revealment of relationships among elements (Sarica et al., 2020; Amal et al., 2019). Furthermore, NER has proved to be effective in broader applications, such as user profiling (Nicoletti et al., 2013) and ontology development in unconventional domains (Oliva et al., 2019; Rodrigues et al., 2019).

    • Actor’s knowledge massive identification in the learning management system

      2021, Intelligent Systems and Learning Data Analytics in Online Education
    • On the Automatic Generation of Knowledge Connections

      2023, International Conference on Enterprise Information Systems, ICEIS - Proceedings
    View all citing articles on Scopus
    View full text