A Survey of event extraction methods from text for decision support systems
Introduction
Over the years, Information Extraction (IE) has become increasingly popular as a tool for a vast array of applications [[1], [2], [3], [4], [5]]. At first, the IE field was focused particularly on message understanding in newswires. However, due to the onset of progressively larger digital data collections of various natural language text types such as news messages, articles, and web pages, researchers and practitioners require more advanced techniques, extract more information with greater accuracies and on a real-time basis, and operate on larger scales than ever before. Since the early 2000s, there has been a notable shift from general information extraction from digital collections – extracting basic named entities like persons and organizations – toward more advanced forms of text mining, including Event Extraction (EE) that requires the handling of textual content or data describing complex relations between entities [6]. This development has been fueled by the continuous advances in Text Mining (TM) and Natural Language Processing (NLP), the advent of big data, as well as the availability of (manually) annotated data sets that often serve as a basis for building extraction models.
Event extraction combines knowledge and experience from a number of domains, including computer science, linguistics, data mining, artificial intelligence, and knowledge modeling. It is commonly seen as the TM-aided extraction of complex combinations of relations between actors (entities), performed after executing a series of initial NLP steps. It is a form of IE, aimed at specific users, applications, and platforms, that results in more complex and detailed outputs than regular IE. Event extraction originates in the late 1980s, when the U.S. Defense Advanced Research Projects Agency (DARPA) boosted research into message understanding, aimed at automating the identification of terrorism-related events from newswires, a topic that has remained trending up until today.
With the exponential growth of digital collections and the information extraction requirements in various fields, event extraction research has evolved greatly. Early mentions of modern event extraction can be found in the biomedical literature, where NLP techniques have been traditionally employed for discovering biological entities such as genes and proteins, but where the same techniques are now also widely used for identifying events involving these entities, e.g., gene expressions and protein bindings [7]. Gradually, event extraction has moved to other domains such as politics and finance, where events like elections, CEO changes, or acquisitions, are also comprised of sets of entities (e.g., persons, governments, countries) and their relations (e.g., leadership, competitor, ownership) [8], [9].
The detailed information that is commonly extracted from a heterogeneous set of sources in event extraction implementations, becomes increasingly important for supporting decision making processes. Today, the applications of events in decision support systems are plentiful. For instance, events can be used in mediation information systems [10], for the analysis of firm-specific social media monitoring [11], or even for advanced spatio-temporal reasoning in moving objects [12] and vehicle routing [13]. Other popular applications of events lie in environmental scanning [14], news personalization systems [15], algorithmic trading [16], financial risk analysis [17], e-commerce [18], quality assurance [19], [20], and terrorism detection [21].
Such event-based decision support systems commonly define an event as something that is regarded as happening during a particular interval of time. Events can have multiple occurrences and are generally seen as incidents of substantial importance. In this work, we do not consider organized events such as soccer matches, scientific conferences, and parties, but we focus on unexpected occurrences which need to be acted upon. Such events are universally associated with state changes. However, per domain, their definition, complexity, and interpretation could greatly differ.
Irrespective of their domains, extracted events are associated with changes in the state of the current knowledge, and hence can be employed for decision making, prediction, or monitoring. The applications are numerous, e.g., generating trading signals for stock exchange markets, providing event-driven data integration in decision support systems, creating social media monitoring systems by police departments, and discovering defects in products. Hence, these developments render traders, managers, and companies to be the users that immediately benefit from event extraction.
Despite the envisaged usefulness and wide prospective applicability of event extraction, several hurdles have to be overcome until event extraction is widely adopted as a supportive tool in practice. The main requirements that were trending in the nineties for information extraction [2], are still applicable to event extraction today. For instance, the technologies should deliver sufficiently accurate results. Furthermore, construction and processing costs should be minimized, and systems are preferred to be operable by non-specialists. These challenging requirements have led to many research efforts in the last decade, of which the main ideas are surveyed in this article.
Although IE in general is certainly a heavily researched and well-described area, to our knowledge, there is little overview work focusing on the upcoming field of event extraction. Therefore, in order to aid researchers and practitioners in making well-informed decisions about their event extraction applications, we survey high-performance extraction techniques and their common applications in decision support systems. While a preliminary survey on event extraction from text already exists [22], here we provide a more complete overview on a higher level of abstraction, and also cover the most recent works. Moreover, in our current endeavors, the various approaches to event extraction are evaluated on more (qualitative) dimensions. We discuss common applications of event extraction in decision support systems and additionally focus on the evaluation of event extraction methods. Last, current research issues in event extraction from text are highlighted.
Section snippets
Techniques
Both in recent research and in practice, a great many of different event extraction techniques have been proposed and applied. In the following discussion on the main techniques that are employed for event extraction, we omit the peculiarities of individual approaches, and focus on several aspects of various commonly applied extraction techniques, identifying their unique properties, advantages, and disadvantages.
A common distinction of event extraction approaches stems from the field of
Decision support applications
The applications of event extraction in decision support systems are very diverse2 , and can be divided into two major fields. First, event extraction has a wide range of utilizations in the biomedical domain [6], [7], [24], [25], [38], [41], [46], for instance for identifying molecular events, protein bindings, and gene expressions, which can subsequently be used in biomedical research. Fig. 1 is a typical
Evaluation
For the evaluation of event extraction methods, researchers often rely on quantitative indicators, measuring performance using a golden standard-based approach. Data sets, consisting of news messages, documents, articles, etc., are annotated by domain experts, meticulously detailing the events that should be found by the (semi-)automatic event extraction approaches. In accordance with IE and TM, performance is generally measured by computing the number of true positives and negatives, as well
Research issues
In event extraction, there are many open research issues and points of particular interest, of which the main ones are related to: 1) the context-based advantage of data-driven, knowledge-driven, or hybrid approaches, 2) understanding the limitations of specific event extraction techniques, 3) the domain-dependency of event extraction procedures, affecting both their flexibility and effectiveness, 4) the scalability of event extraction approaches when dealing with big data, and 5) the
Conclusion
Event extraction has recently gained in popularity due to its wide applicability for various purposes. In this article, we reviewed the various data-driven, knowledge-driven, and hybrid techniques of event extraction, and evaluated the works on a set of qualitative dimensions, i.e., the amount of required data, knowledge, and expertise, as well as the interpretability of the results and the required development and execution times. We identified the major strengths and weaknesses of the main
Acknowledgments
The authors are partially supported by the NWO Physical Sciences Free Competition project 612.001.009: Financial Events Recognition in News for Algorithmic Trading (FERNAT) and the Dutch national program COMMIT.
Frederik Hogenboom obtained cum laude the MSc degree in economics and informatics from the Erasmus University Rotterdam, the Netherlands, in 2009, specializing in computational economics. In 2014, he received the PhD degree in economics and informatics at the same university, after a PhD track in which he focused on ways to employ financial event discovery in emerging news for algorithmic trading, hereby combining techniques from various disciplines, among which Semantic Web, text mining,
References (54)
- et al.
A lexico-semantic pattern language for learning ontology instances from text
J. Web Semant.: Sci., Servi. Agents World Wide Web
(2012) - et al.
Discovering company revenue relations from news: a network approach
Decis. Support. Syst.
(2009) - et al.
Event-driven agility of interoperability during the run-time of collaborative processes
Decis. Support. Syst.
(2014) - et al.
Analyzing firm-specific social media and market: a stakeholder-based event analysis framework
Decis. Support. Syst.
(2014) - et al.
Specifiying and detecting spatio-temporal events in the Internet of things
Decis. Support. Syst.
(2013) - et al.
An event-driven optimization framework for dynamic vehicle routing
Decis. Support. Syst.
(2012) - et al.
Event detection from online news documents for supporting environmental scanning
Decis. Support. Syst.
(2004) - et al.
Vehicle defect discovery from social media
Decis. Support. Syst.
(2012) - et al.
What's buzzing in the blizzard of buzz? Automotive component isolation in social media postings
Decis. Support. Syst.
(2013) - et al.
Web mining for event-based commonsense knowledge using lexico-syntactic pattern matching and semantic role labeling
Expert Syst. Applic.
(2010)
Ontology-based fuzzy event extraction agent for Chinese e-news summarization
Expert Syst. Applic.
Multi-lingual support for lexicon-based sentiment analysis guided by semantics
Decis. Support. Syst.
Introduction to information extraction
AI Commun.
Information extraction
Commun. ACM
Open information extraction from the web
Commun. ACM
Information extraction a multidisciplinary approach to an emerging information technology
Ch. Information Extraction: Techniques and Challenges
Information extraction: algorithms and prospects in a retrieval context
Complex event extraction at PubMed scale
Bioinformatics
Event extraction from biomedical papers using a full parser
Semi-automatic financial events discovery based on lexico-semantic patterns
Int. J. Web Eng. Technol.
An automated framework for incorporating news into stock trading strategies
IEEE Trans. Knowl. Data Eng.
Intelligent information processing IV, vol. 288 of IFIP International Federation for Information Processing
Ch. A Risk Assessment System with Automatic Extraction of Event Types
Developing and Executing Electronic Commerce Applications with Occurences
Terrorism information extraction from online reports
J. Comput. Inf. Syst.
An overview of event extraction from text
A system for detecting and tracking Internet news event
A Markov logic approach to bio-molecular event extraction
Cited by (117)
Green Discourse Analysis on Twitter: Imperatives to Green Product Management in Sustainable Cities (SDG11)
2024, Studies in Systems, Decision and ControlGenerating Chinese Event Extraction Method Based on ChatGPT and Prompt Learning
2023, Applied Sciences (Switzerland)An Event Extraction Method Based on Template Prompt Learning
2023, Data Analysis and Knowledge Discovery
Frederik Hogenboom obtained cum laude the MSc degree in economics and informatics from the Erasmus University Rotterdam, the Netherlands, in 2009, specializing in computational economics. In 2014, he received the PhD degree in economics and informatics at the same university, after a PhD track in which he focused on ways to employ financial event discovery in emerging news for algorithmic trading, hereby combining techniques from various disciplines, among which Semantic Web, text mining, artificial intelligence, machine learning, linguistics, and finance. Other research interests are related to search technologies, and applications of computer science in economic environments, agent-based systems, and applications of the Semantic Web.
Flavius Frasincar obtained the master degree in computer science from Politehnica University Bucharest, Romania, in 1998. In 2000, he received the professional doctorate degree in software engineering from Eindhoven University of Technology, the Netherlands. He got the PhD degree in computer science from Eindhoven University of Technology, the Netherlands, in 2005. Since 2005, he is assistant professor in information systems at Erasmus University Rotterdam, the Netherlands. He published numerous publications in the areas of databases, Web information systems, personalization, and the Semantic Web. He is a member of the editorial board of the International Journal of Web Engineering and Technology and a senior editor of Decision Support Systems.
Uzay Kaymak received the MSc degree in electrical engineering, the Degree of Chartered Designer in information technology, and the PhD degree in control engineering from the Delft University of Technology, Delft, the Netherlands, in 1992, 1995, and 1998, respectively. From 1997 to 2000, he was a Reservoir Engineer with Shell International Exploration and Production. He holds the chair of information systems in the healthcare at the School of Industrial Engineering, Eindhoven University of Technology, the Netherlands. Prof. Kaymak has co-authored more than 200 academic publications in the fields of intelligent decision support systems, computational intelligence, data mining, and computational modeling methods. He is an associate editor of IEEE Transactions on Fuzzy Systems and is a member of the editorial board of several journals.
Franciska de Jong is full professor of language technology at the University of Twente, the Netherlands, since 1992. She is also affiliated with the Erasmus University Rotterdam, the Netherlands, where she is director of the Erasmus Studio. She studied Dutch language and literature at the university of Utrecht, the Netherlands, did a PhD track in theoretical linguistics and started to work on language technology in 1985 at Philips Research, where she worked on machine translation. Currently, her main research interest is in the field of multimedia indexing, text mining, semantic access, cross-language retrieval, and the disclosure of cultural heritage collections (in particular spoken audio archives). She is frequently involved in international programme committees, expert groups, and review panels, and has initiated a number of European Union projects. Since 2008, she is a member of the Governing Board of the Netherlands Organization for Scientific Research (NWO).
Emiel Caron received the MSc degree in information management at the University of Tilburg, the Netherlands, and subsequently completed a PhD track in business economics at the Erasmus University Rotterdam, the Netherlands in 2013. Currently, he is an assistant professor in business analytics at the Erasmus University Rotterdam. He specializes in multi-dimensional OLAP databases, data warehouse solutions with various software solutions and database platforms, and business applications of data mining and data analysis. His research interests are business intelligence, data mining, and business process modeling. Previously, he has worked at the Centre for Science and Technology Studies (CWTS) and the Dutch IT company PinkRoccade (now Getronics).