Selection criteria for text mining approaches

https://doi.org/10.1016/j.chb.2014.10.062Get rights and content

Highlights

  • Text mining include several techniques like categorization of text, clustering, etc.

  • Text mining techniques can be used to finding useful information from documents.

  • We propose some criteria to evaluate the effectiveness of text mining techniques.

  • These proposed criteria can facilitate the selection of appropriate technique.

Abstract

Text mining techniques include categorization of text, summarization, topic detection, concept extraction, search and retrieval, document clustering, etc. Each of these techniques can be used in finding some non-trivial information from a collection of documents. Text mining can also be employed to detect a document’s main topic/theme which is useful in creating taxonomy from the document collection. Areas of applications for text mining include publishing, media, telecommunications, marketing, research, healthcare, medicine, etc. Text mining has also been applied on many applications on the World Wide Web for developing recommendation systems. We propose here a set of criteria to evaluate the effectiveness of text mining techniques in an attempt to facilitate the selection of appropriate technique.

Introduction

Knowledge about data or text mining from important and relatively larger database has been recognized by numerous scholars and researchers. Data mining or knowledge discovery, works well on data stored in a structured manner. Often, the data that has not been well structured yet still contains a lot of hidden information. Text mining entails automatically analyzing a corpus of text documents and discovering previously hidden information. The result might be another piece of text or any visual representation. We start by extracting the useful information from text like facts and events and eventually perform some data mining tasks to gain new knowledge. Text mining generally includes categorization of information or text, clustering the text, extraction of entity or concept, development and formulation of general taxonomies.

Text mining deals with unstructured or textual information for the extraction of meaningful information and knowledge from huge amount of text. They are required for the efficient analysis and exploration of information available in text form. Text mining is required to convert the text into data which then pass through other data mining techniques for analysis. Most of the times, data that we gather from different sources is so large that we cannot read it and analyze it manually so we need text mining techniques to deal with such data. Identifying and separating out any specific type of information from the given text requires text mining techniques or methods. These methods also help in clustering the data into different groups on the basis of specific requirements. In the field of education, text mining techniques helps to explore and analyze data coming from new discoveries and researches that are made on daily basis in large amount. Text mining methods are also required whenever we need to validate extensive data by analyzing it with some special criteria. Text mining includes statistical, linguistic and machine learning techniques that are needed for studying and examining textual information required for further data analysis, research and investigation.

From the available literature and applications, text mining is used heavily in different domains such as

  • Web document based text clustering (Ahmad and Khanum, 2010, Bhushan et al., 2014, Navaneethakumar and Chandrasekar, 2012).

  • Information retrieval (Rath et al., 2011, Senellart and Blondel, 2008, Vashishta and Jain, 2011).

  • Knowledge transfer and integration (Achtert et al., 2006, Kriegel et al., 2009, Silwattananusarn and Tuamsuk, 2012).

  • Topic tracking (Krause et al., 2006, Patel and Sharma, 2014).

  • Summarization, categorization, clustering, and concept linkage (Caropreso et al., 2009, Kriegel et al., 2009, Lehmam, 2010, Lincy Liptha et al., 2010, Navaneethakumar and Chandrasekar, 2012, Patel and Sharma, 2014, Senellart and Blondel, 2008).

  • Information visualization and question answering (Burley, 2010, Don et al., 2007).

  • Emotional contents of texts in online social networks (Dhawan et al., 2004, Dhawan et al., 2014, Shelke, 2014).

  • Data collection, database schemas, data processing (Don et al., 2007, Kiyavitskaya et al., 2006, Tan and Lambrix, 2009, Zhai et al., 2004).

  • …. etc.

There is a need of fast, automatic and intelligent computational power that can deal with huge data, extract required information, and help us to predict future aspects in small amount of time e.g. in business, education, security systems, etc. Text mining has many advantages:

  • Help extract useful information from bulk of data in short time and efficiently.

  • Assist in predicting future aspects based on provided observations and statistics.

  • Help to create and build patterns from the provided data which tells us about increasing or decreasing trends, e.g. in business and economy.

  • Text mining software’s also helps in security agencies by monitoring and analysis of textual data gathered from internet sources blogs, etc.

Another advantage of text mining techniques is their use in biomedical databases, where these techniques improve the search from literature. Text mining methods advances the analysis, storage and availability of information on different websites and search engines to make the process of searching more efficient and more accurate. It also deals with lexical analysis and pattern recognition and helps to study word frequency distribution. The text mining process has the basic stages depicted in Fig. 1.

Section snippets

Related work

Text mining involves all activities in discovery of information and other pertinent data from a variety of textual sources. However, the extracted data have been always of little value in its raw formats. In many instances, people confuse Text Mining with the regular web search. As much as both result in acquisition of data, a large gap exists on the input. In a common web search, users are dedicated toward acquiring specific data, which may be mostly, entails looking for known and/or specified

The proposed selection criteria

In this work, we propose a selection technique that is based on determining weighting of text mining criteria based on the number of those papers whom emphasized on each specific criterion. We have calculated criteria’s weights after surveying more than 130 research papers in different text mining techniques publications. Each publication could include several criteria. We have used the scale of 2–7 to determine the weights of the selected criteria. For example, if we are only interested in 12

Discussion and conclusion

In Text mining, the use of machine learning and data mining approaches has developed different tools and techniques that have been well studied and examined in the literature. Text Mining has been applied on wide areas of research including eLearning, social networking, bio informatics, pattern matching, user experience, intelligent tutoring systems, etc.

Most text mining techniques are based on different approaches such as clustering, classification, relationship Mining and Pattern Matching (

Acknowledgement

This work was supported by College of Computer and Information Sciences, King Saud University. The authors are grateful for this support.

References (33)

  • Achtert, E., Böhm, C., Kriegel, H. P., Kröger, P., Müller-Gorman, I., Zimek. Finding hierarchies of subspace clusters....
  • R. Ahmad et al.

    Document topic generation in text mining by using cluster analysis with EROCK

    International Journal of Computer Science & Security (IJCSS)

    (2010)
  • AlSumait, L., & Domeniconi, C. (2007). Text clustering with local semantic...
  • J. Bhushan et al.

    Searching research papers using clustering and text mining

    International Journal of Emerging Technology and Advanced Engineering

    (2014)
  • D. Burley

    Information visualization as a knowledge integration tool

    International Journal of Knowledge Management Practice

    (2010)
  • Caropreso, M. F., Matwin, S., & Sebastiani, F. (2009). Statistical phrases in automated text...
  • S. Dhawan et al.

    A framework for polarity classification and emotion mining from text

    International Journal of Engineering and Computer Science

    (2004)
  • S. Dhawan et al.

    Emotion mining techniques in social networking sites

    International Journal of Information & Computation Technology

    (2014)
  • Don, A., Zheleva, E., Gregory, M., Tarkan, S., Auvil, L., Clement, T., et al. (2007). Discovering interesting usage...
  • Gharehchopogh, F. S., & Abbasi Khalifehlou, Z. (2011). Study on information extraction methods from text mining and...
  • V. Gupta

    A survey of text mining techniques and applications

    Journal of Emerging Technologies in Web Intelligence

    (2009)
  • Howland, P., & Park, H. (2007). Cluster-preserving dimension reduction methods for document...
  • Kiyavitskaya, N., Zeni, N., Mich, L., Cordy, J., & Mylopoulos, J. (2006). Text mining through semi automatic semantic...
  • S.B. Kotsiantis

    Supervised machine learning: A review of classification techniques

    Informatica

    (2007)
  • Krause, A., Leskovec, J., & Guestrin, C. (2006). Data association for topic intensity tracking. In International...
  • H.P. Kriegel et al.

    Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering

    Transactions on Knowledge Discovery from Data (New York, NY: ACM)

    (2009)
  • Cited by (79)

    • A multi-objective memetic algorithm for query-oriented text summarization: Medicine texts as a case study

      2022, Expert Systems with Applications
      Citation Excerpt :

      By means of text mining tools, it is possible to extract specific information from a large set of documents (Fan & Bifet, 2013). Particularly, these tools can automatically produce a summary from all the textual information (Hashimi et al., 2015). An automatic summary would fulfill the needs of users, since the volume of information would be considerably reduced while also maintaining the most relevant one.

    • The impact of blockchain on the aviation industry: Findings from a qualitative study

      2021, Research in Transportation Business and Management
      Citation Excerpt :

      In this respect, the same UTF-8 text file was used to plot a network visualisation using VOSviewer 1.6.11 (Jiang, Ritchie, & Benckendorff, 2019; van Eck & Waltman, 2018). This data mining approach is frequently used to uncover consistent patterns hidden in large data sets (Gupta & Lehal, 2009; Hashimi, Hafez, & Mathkour, 2015). For small maps containing no more than 100 items, simple graphical representations typically yield satisfactory results (van Eck & Waltman, 2010).

    • Sectoral patterns of accident process for occupational safety using narrative texts of OSHA database

      2021, Safety Science
      Citation Excerpt :

      Prior to the body of results, two main analytics of textmining and LDA algorithms will be introduced in Section 3.2 and 3.3 The major purpose of textmining is to extract meaningful keywords from a text document of unstructured data through the algorithmic process (Hashimi et al., 2015). Textmining is to retrieve patterns or relations of keywords contained in documents in terms of their frequency or weight.

    View all citing articles on Scopus
    View full text