Distracting users as per their knowledge: Combining linked open data and word embeddings to enhance history learning
Introduction
The current impact of technology in users’ lifestyle and the universal access to information through the Internet pose major challenges for cultural institutions which preserve and promote heritage, thus urging a change in the kind of experiences offered to target audiences. Attracting the users’ attention requires these institutions must be aware of their particular interests, preferences, needs and expectatives, since, according to Gheorghilas, Dumbraveanu, Tudoricu, and Craciun (2017), current visitors are more and more “sophisticated” and want to be “provoked” (more than informed) by means of innovative and stimulating experiences. In this context, the goal is to slide from the simple provision of information (which is already available in the Web) to enhanced experiences based on interpretation of historical facts driven by Humanities experts, open debates and active interaction with like-minded visitors through edutainment approaches (van Aalst & Boogaarts, 2002). In other words, it is necessary to switch from a simplistic and local exposition of historic events, characters and related artworks that often disregards the existence of interesting connections among all of them to “immersive and engaging experiences where audiences can explore and discover stories about the world around them, past and present” (Carretero, Lopez, Jacott, 1997, Gheorghilas, Dumbraveanu, Tudoricu, Craciun, 2017, Seixas, 1994, Spoehr, Spoehr, 1994).
The provision of this kind of challenging experiences aimed at improving History learning is the main research problem we attempt to solve. Our first contributions in this field fall into the scope of the Horizon H2020 project Crosscult1, where the authors have developed a pilot application that allows a Humanities expert to create edutainment experiences driven by narratives (texts) about different topics and quiz games for the target audience (see details in Daif et al., 2019). The expert is assisted in this task by a user-friendly tool that discovers associations among historical facts and concepts that are meaningful in the theme of the created experience, resorting to a central repository of knowledge that has been populated by recovering resources from Linked Data initiatives (mainly DBpedia2 and Europeana3).
Along with the nodes that represent concepts in each association, our quiz games incorporate the so-called question nodes. In particular, each question node presents a question about the topics of the expert-provided narrative, along with a set of choices that include the right answer (which is also identified by the expert) and other wrong alternatives called distractors. Arousing the users’ enthusiasm about our quiz games and inspiring them to wish to discover more requires the distractors to be chosen carefully. First, distractors cannot be clearly-false too-trivial options, as this would cause the user to lose interest in the game. Second, distractors must be related to the right answer in a meaningful and ingenious manner, in such a way that the quiz promotes the reflection of the users and besides attracts their attention. Third, distractors must be selected considering the particular level of knowledge of each user on the theme of the expert-created experience, thus avoiding that the responses selected for a knowledgeable visitor discourage other not-so-knowledgeable users (who will probably fail most questions if we use the same distractors for both profiles of players). Lastly but not least, our distractors should enable to deal with a huge diversity of themes with the goal of attracting a wide spectrum of potential visitors with very different interests.
Meeting above requirements, and especially attending to the diversity of visitor-oriented experiences currently demanded, requires high levels of customization that would be hard to manually manage by the organization’s staff. For that reason, we contribute with a proposal that first builds a tailor-made corpus by retrieving (from Wikipedia) relevant documents about the theme of the experience. Next, we automatically select (from those texts) a set of meaningful distractors for each question, according to the particular level of knowledge of each user on the corresponding topic. For that purpose, we harness public structured knowledge repositories bound to Linked Open Data initiatives (where a huge amount of concepts about History are linked and semantically described), Natural Language Processing (NLP) techniques, and word embeddings of the Word2Vec tool, where the knowledge discovery is supported by a neural network that is trained over the customized corpus (Mikolov, Chen, Corrado, & Dean, 2013).
Our approach brings significant benefits for cultural organizations, which will not foreseeably have specialized staff capable of manually creating valuable games on a wide variety of topics (especially if they are small and medium-sized institutions). Also, our automatic selection of distractors permit to deploy custom-tailor games, enabling to deal with recurring changes in both the users’ preferences and the topics posed in highly-dynamic experiences. This would enable, for instance, to select different distractors for a given question depending on the personal profile (i.e., interests and knowledge on the game theme) of the visitors who attend the venue throughout the day. Obviously, the provision of such visitor-oriented tailor-made games would be unfeasible by a Humanities expert in the abscence of automatic assistance.
The selected distractors are finally suggested to the expert so that s/he can integrate them during the creation of the experience. Our quiz games are aligned with the challenges described at the beginning of this section, since the distractors can be exploited, for instance, in the deployment of competitions among visitors who are located in different venues, or in open debates (physically or remotely) driven by experts who help interpret interesting historic facts that have been interconnected during the development of the quiz game. Both experiences boost learning while visitors have fun and promote their reflection and knowledge retention, thus gaining widespread appeal (Daif et al., 2019). In additional, the expert-provided narrative might also link historic topics to annual commemorations (e.g., international day of women, abolition of slavery, world television day...) in order to shape engaging quiz games which can inspect appealing connections between past and present on the subject commemorated on that date. For instance, distractors could find connections between people who have been persecuted throughout History for their sexual condition or religious beliefs, in the context of ad-hoc experiences created, for example in a museum, on the occasion of the international day of sexual and religious freedom. We argue that this kind of experiences contribute to trigger the visitors’ curiosity and awake their interest in coming back the venue to discover more.
This paper is organized as follows. Section 2 details the NLP technologies and Linked Data initiatives adopted in related works, highlighting the main functionalities we exploit (and even adapt) in our approach. Section 3 focuses on the procedure for collecting the documents of our tailored historical corpus. The algorithmic internals of our distractor identification strategy are detailed in Section 4. Next, Section 5 focuses on the experimental validation, where our new strategy is compared to the former approach to distractor selection we explored in Daif et al. (2019). Lastly, Section 6 concludes the paper and points out some lines of further research.
Section snippets
Technologies involved in our framework
As mentioned, the goal of this research is to identify appropriate distractors, given a question and its right answer. A distractor can be defined as an entity of the same nature as the answer, both of them sharing some features that could lead to mistakes when selecting the answer. That is, distractors are used to check how solid is the knowledge of the user about a theme.
These distractors will be extracted from a repository of information about the theme of the expert-proposed game. Even
Gathering a tailor-made corpus for each experience
In this section we describe how we collect the documents of the tailor-made corpus required to train the Word2Vec neural network, which uncovers the (cooccurrences-based) connections exploited in our distractor identification strategy (details will be given in Section 4). This process is conducted by the following guidelines to add or not a new document to the corpus:
- •
First, the documents must be about the topics of the experience, which is contextualized by the expert-created brief
Computing distractors
We are now armored with a set of interesting tools to process the information in an intelligent way (Section 2) and we have access to a big dataset to browse and play with (Section 3). We are ready to describe the algorithm that suggests distractors to the expert for him/her to complete the details of the quizzes for checking the knowledge of the user about the theme of the experience. In this algorithm, we will get through three clear and differentiated phases: an initial one, supported by the
Experimental evaluation
The experiments we have carried out pursued a two-fold goal:
- •
On the one hand, we wanted to verify the effectiveness of the refinements developed in our new distractor identification strategy, compared to the preliminary approach we explored in Daif et al. (2019).
- •
On the other one, we needed to confirm the effect of the each player’s particular expertise on the automatic selection of customized distractors.
Conclusions and further work
From the technical point of view, the importance of our work has to do with the ability to discover new knowledge from the raw text sources available in Wikipedia, by exploiting synergies between the semantically-interlinked structured data from LOD repositories (mainly, DBpedia, YAGO and Wikidata) and NLP techniques (specifically, Named Entity Recognition and word embeddings). In particular, NER tools allow to identify DBpedia named entities in Wikipedia articles and to use their semantic
CRediT authorship contribution statement
Yolanda Blanco-Fernández: Data curation, Formal analysis, Investigation, Writing - original draft, Writing - review & editing. Alberto Gil-Solla: Data curation, Formal analysis, Investigation, Writing - original draft, Writing - review & editing. José J. Pazos-Arias: Data curation, Formal analysis, Investigation, Writing - original draft, Writing - review & editing. Manuel Ramos-Cabrer: Data curation, Formal analysis, Investigation, Writing - original draft, Writing - review & editing. Abdullah
Declaration of Competing Interest
The authors declare that there is no conflict of interest.
Acknowledgements
This work has been supported by the European Regional Development Fund (ERDF) through the Ministerio de Economía, Industria y Competitividad (Gobierno de España) research project TIN2017-87604-R, and through the Galician Regional Government under the agreement for funding the AtlantTIC Research Center for Information and Communication Technologies its Program for the Consolidation and Structuring of Competitive Research Groups.
References (39)
Backpropagation and stochastic gradient descent method
Neurocomputing
(1993)- et al.
Exploring historical events
International Journal of Educational Research
(1997) - et al.
A structural probe for finding syntax in word representations
Proceedings of conference of the north american chapter of the association for computational linguistics: Human language technologies
(2019) - et al.
Microblog semantic context retrieval system based on linked open data and graph-based theory
Expert Systems With Applications
(2016) - et al.
Media meets semantic web - how the bbc uses DBpedia and linked data to make connections
Proceedings of the 6th european semantic web conference (ESWC)
(2009) Student’s understanding of historical significance
Theory and Research in Social Education
(1994)- et al.
YAGO: A multilingual knowledge base from wikipedia, wordnet, and geonames
Proceedings of the 15th international semantic web conference (ISWC)
(2016) - et al.
A neural probabilistic language model
Journal of Machine Learning Research
(2003) - et al.
Ficlone: improving DBpedia spotlight using named entity recognition and collective disambiguation
Open Journal of Semantic Web
(2018) - et al.
Semantic parsing for text to 3D scene generation
Proceedings of ACL 2019 workshop on semantic parsing
(2014)
A unified architecture for natural language processing: deep neural networks with multitask learning
Proceedings of the 25th international conference on machine learning (ICML)
Natural language processing (almost) from scratch
Journal of Machine Learning Research
Improving efficiency and accuracy in multilingual entity extraction
Proceedings of the 9th international conference on semantic systems (i-semantics)
A mobile app to learn about cultural and historical associations in a closed loop with humanities experts
Applied Sciences
Multi-category classification by soft-max combination of binary classifiers
Proceedings of 4th international conference on multiple classifier systems
Beyond word2vec: Embedding words and phrases in same vector space
Proceedings of international conference on natural language processing (ICON)
The challenges of the 21st-century musuems: Dealing with sophisticated visitors in a sophisticated world
International Journal of Scientific Management and Tourism
A systematic mapping study on open information extraction
Expert Systems With Applications
Spied: Stanford pattern based information extraction and diagnostics
Proceedings of workshop on interactive language learning, visualization and interfaces
Cited by (7)
A simple and fast method for Named Entity context extraction from patents
2021, Expert Systems with ApplicationsCitation Excerpt :Unfortunately, it is not as easy to create annotated training sets for technical documents and entities, mainly for three reasons. First, domain specific entities are costly to tag, because they are rare with respect to more generic words (Chiarello, Fantoni, Bonaccorsi, et al., 2017); second, manually tagged data-set of technical documents have a big business value, thus they are not open-sourced by researchers or companies (Blanco-Fernández et al., 2020); third, manual tagging for entity specific tasks requires the time of experts in the chosen domain, and experts are known to have limited time (Chiarello, Trivelli, Bonaccorsi and Fantoni, 2018). For these reasons, in the present paper we aim at using the state-of-the-art in Natural Language Processing in order to create a domain specific (in terms of entity and document type) NER system, without relying on manual annotated data.
SkillNER: Mining and mapping soft skills from any text
2021, Expert Systems with ApplicationsCitation Excerpt :NER is a computational linguistic method capable of extracting and classifying named entities mentioned in unstructured text into predefined categories (such as person names, locations, and product names). Assigning a word to a semantic class provides crucial information for tasks such as question answering (Abujabal et al., 2018; Blanco-Fernández et al., 2020), topic disambiguation (Fernández et al., 2012) or detection (Krasnashchok & Jouili, 2018; Lo et al., 2017; Al-Nabki et al., 2019), and revealment of relationships among elements (Sarica et al., 2020; Amal et al., 2019). Furthermore, NER has proved to be effective in broader applications, such as user profiling (Nicoletti et al., 2013) and ontology development in unconventional domains (Oliva et al., 2019; Rodrigues et al., 2019).
Actor’s knowledge massive identification in the learning management system
2021, Intelligent Systems and Learning Data Analytics in Online EducationAutomatically Assembling a Custom-Built Training Corpus for Improving the Learning of In-Domain Word/Document Embeddings
2023, Informatica (Netherlands)On the Automatic Generation of Knowledge Connections
2023, International Conference on Enterprise Information Systems, ICEIS - ProceedingsIntelligent machine for ontological representation of massive pedagogical knowledge based on neural networks
2021, International Journal of Electrical and Computer Engineering