Abstract
Stemming algorithms (stemmers) are used to convert the words to their root form (stem); this process is used in the pre-processing stage of the Information Retrieval Systems. The Stemmers affect the indexing time by reducing the size of index file and improving the performance of the retrieval process. There are several stemming algorithms; the most widely used is the Porter Stemming Algorithm because of its efficiency, simplicity, speed and also its ease at handling exceptions. However there are some drawbacks, although many attempts were made to improve its structure but they were incomplete. This paper provides efficient information on the retrieval technique as well as proposes a new stemming algorithm called the Enhanced Porter’s Stemming Algorithm (EPSA). The objective of this technique is to overcome the drawbacks of the Porter algorithm and improve web searching. The EPSA was applied to two datasets to measure its performance. The result shows improvement of precision over the original Porter algorithm while realizing approximately the same recall percentages.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Vester, K., Martiny, M.: Information retrieval in document spaces using clustering. Master Thesis, Technical University of Denmark, Denmark (2005)
Sharma, D.: Stemming algorithms: a comparative study and their analysis. Int. J. Appl. Inf. Syst. (IJAIS) 4(3), 7–12 (2012)
Singhal, A.: Modern information retrieval: a brief overview. IEEE Data Eng. Bull. 24(4), 35–43 (2011)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. ACM Press, New York, USA (1999)
Yamout, F., Demachkieh, R., Hamdan, G., Sabra, R.: Further Enhancement to the Porter’s Stemming Algorithm, pp. 7–23. Machine Learning and Interaction for Text based Information Retrieval, Germany (2004)
Maurya, V., Pandey, P., Maurya, L.S.: Effective information retrieval system. Int. J. Emerg. Technol. Adv. Eng. 3(4), 787–792 (2013)
Sembok, T., Abu Ata, B., Bakar.: A rule and template based stemming algorithm for Arabic language. Int. J. Math. Models Methods Appl. Sci. 5(5), 974–981 (2011)
Moral, C., Antonio, A., Imbert, R., Rmirez J.: A survey of stemming algorithms in information retrieval. Inf. Res.: Int Electron. J. 19(1) (2014)
Frakes, W., Baeza, R.: Stemming Algorithms. Data Structures and Algorithms, pp. 131–160. Prentice Hall, Upper Saddle River (1992)
Kantrowitz, M., Mohit, B., Mittal, V.: Stemming and its effects on TFIDF ranking. In: Proceedings of the 23rd Annual International ACM SIG IR Conference on Research and Development in Information Retrieval, pp. 357–359, Athens, Greece (2000)
Tala, F.: A study of stemming effects on information retrieval in Bahasa Indonesia. M.Sc. Thesis, University of Amsterdam (2003)
Buttcher, S., Clarke, C., Cormack, G.: Information Retrieval: Implementing and Evaluating Search Engines. Massachusetts Institute of Technology, USA (2010)
Jivani, A.: A comparative study of stemming algorithms. Int. J. Comp. Tech. Appl. 2(6), 1930–1938 (2011)
Sharma, D.: Improved stemming approach used for text processing in information retrieval system. Master of Engineering in Computer Science & Engineering, Thapar University, Patiala (2012)
Paice, C.: Another stemmer. ACM SIGIR Forum 24(3), 56–61 (2012)
Singh, A., Kumar, N., Gera, S., Mittal, A.: Achieving manitude order improvement in Porter stemmer algorithm over multi-core architecture. In 7th International Conference on Informatics and Systems (INFOS), pp. 1–8 (2010)
Aas, L., Eikvil, L.: Text categorisation: a survey, Raport NR 941, Norwegian Computing Center (1999)
Larkey, L., Croft, W.: Combining classifiers in text categorization. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 289–297, Zurich, Switzerland (1996)
Hotho, A., Stumme, G.: Conceptual clustering of text clusters. In: Kokai, G., Zeidler, J. (eds.) Proceedings Fachgruppentreffen Maschinelles Lernen (FGML 2002) (2002)
Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: KDD Workshop on Text Mining (2000)
Kraaij, W., Pohlmann, R.: Porter’s stemming algorithm for Dutch. In: Noordman, L.G.M., de Vroomen, W.A.M. (eds.) Informatiewetenschap, Tilburg (1994)
Savoy, J.: Light stemming approaches for the French, Portuguese. In: German and Hungarian Languages Proceedings of the 2006 ACM Symposium on Applied Computing, pp. 1031–1035 (2006)
Lovins, J.: Development of a stemming algorithm. Mech. Transl. Comput. Linguist. 11(1), 22–31 (1968)
Bijal, D., Sanket, S.: Overview of stemming algorithms for Indian and Non-Indian languages. Int. J. Comput. Sci. Inf. Technol. (IJCSIT) 5(2), 1144–1146 (2014)
Smirnov, I.: Overview of stemming algorithms. http://the-smirnovs.org/info/stemming.pdf (2014)
Dawson, J.: Suffix removal and word conflation. ALLC Bulletin 2(3), 33–46 (1974)
Al-Shammari, E.: Towards an error-free stemming. In: Proceedings of the 2008 International Conference on Data Mining (IADIS’08), Amsterdam, Netherlands (2008)
Golsmith, J., Higgins, D., Soglasnova, S.: Automatic language-specific stemming in information retrieval. In: Cross-Language Information Retrieval and Evaluation, Proceeding of CLEF 2000 Workshop, Berlin, pp. 273–283 (2001)
Hull, A.: Stemming algorithms: a case study for detailed evaluation. J. Am. Soc. Inf. Sci. 47(1), 70–84 (1996)
Kraaij, W., Pohlmann, R.: Viewing stemming as recall enhancement. In: Frei, H.-P., Harman, D., Schauble, P., Wilinson, R. (eds.) Processings of the 17th ACM SIGIR Conference held at Zurich, pp. 40–48 (1996)
Willett, P.: The Porter stemming algorithm: then and now. Program: Electron. Lib. Inf. Syst. 40(3), 219–223 (2006)
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Srinivasan, S., Thambidurai, P.: STANS algorithm for root word stemming. Inf. Technol. J. 5(4), 685–688 (2006)
Megala, S., Kavitha, A., Marimuthu, A.: Improvised stemming algorithm—TWIG. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 3(7), 168–171 (2013)
Gupta, R., Jivani, A.: Empirical analysis of affix removal stemmers. Int. Comput. Technol. Appl. (IJCTA) 5(2), 393–399 (2014)
The Lancaster Stemming Algorthim: Porter, http://www.comp.lancs.ac.uk/computing/research/stemming/general/porter.htm
What is Porter Stemming: http://www.comp.lancs.ac.uk/computing/research/stemming/general/porter.htm
Karaa, W.: A new stemmer to improve information retrieval. Int. J. Netw. Secur. Appl. (IJNSA) 5(4), 143–154 (2013)
Hajeer, S.: Comparison on the effectiveness of different statistical similarity measures. Int. J. Comput. Appl. 53(8), 14–19 (2010)
Hajeer, S.: Vector space model: comparison between Euclidean distance & cosine measure on arabic documents. Int. J. Eng. Res. Appl. 2(4), 2085–2090 (2012)
Paice, C.D.: An evaluation method for stemming algorithms. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 42–50, Dublin, Ireland. ACM (1994)
Kara, W., Gribâa, N.: Information retrieval with Porter Stemmer: a new version for english. Adv. Comput. Sci., Eng. Inf. Technol. AISC 225, 243–254 (2013)
The Porter Stemming Algorithm: http://tartarus.org/~martin/PorterStemmer/index.html
Common IR Test Collection: http://web.eecs.utk.edu/research/lsi/corpa.html
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Hajeer, S.I., Ismail, R.M., Badr, N.L., Tolba, M.F. (2017). A New Stemming Algorithm for Efficient Information Retrieval Systems and Web Search Engines. In: Hassanien, A., Mostafa Fouad, M., Manaf, A., Zamani, M., Ahmad, R., Kacprzyk, J. (eds) Multimedia Forensics and Security. Intelligent Systems Reference Library, vol 115. Springer, Cham. https://doi.org/10.1007/978-3-319-44270-9_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-44270-9_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44268-6
Online ISBN: 978-3-319-44270-9
eBook Packages: EngineeringEngineering (R0)