Skip to main content

A New Stemming Algorithm for Efficient Information Retrieval Systems and Web Search Engines

  • Chapter
  • First Online:
Multimedia Forensics and Security

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 115))

Abstract

Stemming algorithms (stemmers) are used to convert the words to their root form (stem); this process is used in the pre-processing stage of the Information Retrieval Systems. The Stemmers affect the indexing time by reducing the size of index file and improving the performance of the retrieval process. There are several stemming algorithms; the most widely used is the Porter Stemming Algorithm because of its efficiency, simplicity, speed and also its ease at handling exceptions. However there are some drawbacks, although many attempts were made to improve its structure but they were incomplete. This paper provides efficient information on the retrieval technique as well as proposes a new stemming algorithm called the Enhanced Porter’s Stemming Algorithm (EPSA). The objective of this technique is to overcome the drawbacks of the Porter algorithm and improve web searching. The EPSA was applied to two datasets to measure its performance. The result shows improvement of precision over the original Porter algorithm while realizing approximately the same recall percentages.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Vester, K., Martiny, M.: Information retrieval in document spaces using clustering. Master Thesis, Technical University of Denmark, Denmark (2005)

    Google Scholar 

  2. Sharma, D.: Stemming algorithms: a comparative study and their analysis. Int. J. Appl. Inf. Syst. (IJAIS) 4(3), 7–12 (2012)

    Google Scholar 

  3. Singhal, A.: Modern information retrieval: a brief overview. IEEE Data Eng. Bull. 24(4), 35–43 (2011)

    Google Scholar 

  4. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. ACM Press, New York, USA (1999)

    Google Scholar 

  5. Yamout, F., Demachkieh, R., Hamdan, G., Sabra, R.: Further Enhancement to the Porter’s Stemming Algorithm, pp. 7–23. Machine Learning and Interaction for Text based Information Retrieval, Germany (2004)

    Google Scholar 

  6. Maurya, V., Pandey, P., Maurya, L.S.: Effective information retrieval system. Int. J. Emerg. Technol. Adv. Eng. 3(4), 787–792 (2013)

    Google Scholar 

  7. Sembok, T., Abu Ata, B., Bakar.: A rule and template based stemming algorithm for Arabic language. Int. J. Math. Models Methods Appl. Sci. 5(5), 974–981 (2011)

    Google Scholar 

  8. Moral, C., Antonio, A., Imbert, R., Rmirez J.: A survey of stemming algorithms in information retrieval. Inf. Res.: Int Electron. J. 19(1) (2014)

    Google Scholar 

  9. Frakes, W., Baeza, R.: Stemming Algorithms. Data Structures and Algorithms, pp. 131–160. Prentice Hall, Upper Saddle River (1992)

    Google Scholar 

  10. Kantrowitz, M., Mohit, B., Mittal, V.: Stemming and its effects on TFIDF ranking. In: Proceedings of the 23rd Annual International ACM SIG IR Conference on Research and Development in Information Retrieval, pp. 357–359, Athens, Greece (2000)

    Google Scholar 

  11. Tala, F.: A study of stemming effects on information retrieval in Bahasa Indonesia. M.Sc. Thesis, University of Amsterdam (2003)

    Google Scholar 

  12. Buttcher, S., Clarke, C., Cormack, G.: Information Retrieval: Implementing and Evaluating Search Engines. Massachusetts Institute of Technology, USA (2010)

    MATH  Google Scholar 

  13. Jivani, A.: A comparative study of stemming algorithms. Int. J. Comp. Tech. Appl. 2(6), 1930–1938 (2011)

    Google Scholar 

  14. Sharma, D.: Improved stemming approach used for text processing in information retrieval system. Master of Engineering in Computer Science & Engineering, Thapar University, Patiala (2012)

    Google Scholar 

  15. Paice, C.: Another stemmer. ACM SIGIR Forum 24(3), 56–61 (2012)

    Article  Google Scholar 

  16. Singh, A., Kumar, N., Gera, S., Mittal, A.: Achieving manitude order improvement in Porter stemmer algorithm over multi-core architecture. In 7th International Conference on Informatics and Systems (INFOS), pp. 1–8 (2010)

    Google Scholar 

  17. Aas, L., Eikvil, L.: Text categorisation: a survey, Raport NR 941, Norwegian Computing Center (1999)

    Google Scholar 

  18. Larkey, L., Croft, W.: Combining classifiers in text categorization. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 289–297, Zurich, Switzerland (1996)

    Google Scholar 

  19. Hotho, A., Stumme, G.: Conceptual clustering of text clusters. In: Kokai, G., Zeidler, J. (eds.) Proceedings Fachgruppentreffen Maschinelles Lernen (FGML 2002) (2002)

    Google Scholar 

  20. Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: KDD Workshop on Text Mining (2000)

    Google Scholar 

  21. Kraaij, W., Pohlmann, R.: Porter’s stemming algorithm for Dutch. In: Noordman, L.G.M., de Vroomen, W.A.M. (eds.) Informatiewetenschap, Tilburg (1994)

    Google Scholar 

  22. Savoy, J.: Light stemming approaches for the French, Portuguese. In: German and Hungarian Languages Proceedings of the 2006 ACM Symposium on Applied Computing, pp. 1031–1035 (2006)

    Google Scholar 

  23. Lovins, J.: Development of a stemming algorithm. Mech. Transl. Comput. Linguist. 11(1), 22–31 (1968)

    Google Scholar 

  24. Bijal, D., Sanket, S.: Overview of stemming algorithms for Indian and Non-Indian languages. Int. J. Comput. Sci. Inf. Technol. (IJCSIT) 5(2), 1144–1146 (2014)

    Google Scholar 

  25. Smirnov, I.: Overview of stemming algorithms. http://the-smirnovs.org/info/stemming.pdf (2014)

  26. Dawson, J.: Suffix removal and word conflation. ALLC Bulletin 2(3), 33–46 (1974)

    Google Scholar 

  27. Al-Shammari, E.: Towards an error-free stemming. In: Proceedings of the 2008 International Conference on Data Mining (IADIS’08), Amsterdam, Netherlands (2008)

    Google Scholar 

  28. Golsmith, J., Higgins, D., Soglasnova, S.: Automatic language-specific stemming in information retrieval. In: Cross-Language Information Retrieval and Evaluation, Proceeding of CLEF 2000 Workshop, Berlin, pp. 273–283 (2001)

    Google Scholar 

  29. Hull, A.: Stemming algorithms: a case study for detailed evaluation. J. Am. Soc. Inf. Sci. 47(1), 70–84 (1996)

    Article  Google Scholar 

  30. Kraaij, W., Pohlmann, R.: Viewing stemming as recall enhancement. In: Frei, H.-P., Harman, D., Schauble, P., Wilinson, R. (eds.) Processings of the 17th ACM SIGIR Conference held at Zurich, pp. 40–48 (1996)

    Google Scholar 

  31. Willett, P.: The Porter stemming algorithm: then and now. Program: Electron. Lib. Inf. Syst. 40(3), 219–223 (2006)

    Article  MathSciNet  Google Scholar 

  32. Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)

    Article  Google Scholar 

  33. Srinivasan, S., Thambidurai, P.: STANS algorithm for root word stemming. Inf. Technol. J. 5(4), 685–688 (2006)

    Article  Google Scholar 

  34. Megala, S., Kavitha, A., Marimuthu, A.: Improvised stemming algorithm—TWIG. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 3(7), 168–171 (2013)

    Google Scholar 

  35. Gupta, R., Jivani, A.: Empirical analysis of affix removal stemmers. Int. Comput. Technol. Appl. (IJCTA) 5(2), 393–399 (2014)

    Google Scholar 

  36. The Lancaster Stemming Algorthim: Porter, http://www.comp.lancs.ac.uk/computing/research/stemming/general/porter.htm

  37. What is Porter Stemming: http://www.comp.lancs.ac.uk/computing/research/stemming/general/porter.htm

  38. Karaa, W.: A new stemmer to improve information retrieval. Int. J. Netw. Secur. Appl. (IJNSA) 5(4), 143–154 (2013)

    MathSciNet  Google Scholar 

  39. Hajeer, S.: Comparison on the effectiveness of different statistical similarity measures. Int. J. Comput. Appl. 53(8), 14–19 (2010)

    Google Scholar 

  40. Hajeer, S.: Vector space model: comparison between Euclidean distance & cosine measure on arabic documents. Int. J. Eng. Res. Appl. 2(4), 2085–2090 (2012)

    Google Scholar 

  41. Paice, C.D.: An evaluation method for stemming algorithms. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 42–50, Dublin, Ireland. ACM (1994)

    Google Scholar 

  42. Kara, W., Gribâa, N.: Information retrieval with Porter Stemmer: a new version for english. Adv. Comput. Sci., Eng. Inf. Technol. AISC 225, 243–254 (2013)

    Article  Google Scholar 

  43. The Porter Stemming Algorithm: http://tartarus.org/~martin/PorterStemmer/index.html

  44. Common IR Test Collection: http://web.eecs.utk.edu/research/lsi/corpa.html

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Safaa I. Hajeer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Hajeer, S.I., Ismail, R.M., Badr, N.L., Tolba, M.F. (2017). A New Stemming Algorithm for Efficient Information Retrieval Systems and Web Search Engines. In: Hassanien, A., Mostafa Fouad, M., Manaf, A., Zamani, M., Ahmad, R., Kacprzyk, J. (eds) Multimedia Forensics and Security. Intelligent Systems Reference Library, vol 115. Springer, Cham. https://doi.org/10.1007/978-3-319-44270-9_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-44270-9_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-44268-6

  • Online ISBN: 978-3-319-44270-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics