Skip to main content

An Advance Approach for Spam Document Detection Using QAP Rabin-Karp Algorithm

  • Conference paper
  • First Online:
Social Networking and Computational Intelligence

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 100))

Abstract

Document spam is the term which is related to the document copyright issue. It deals with the plagiarism of content from a genuine copy to another. Many researches are performed around the globe for the sake of improvement in different fields such as medical, technology, and agriculture The original content makes impact on the current scenario improvement. Many organization and individual make use of existing concepts to take credit of others’ work in their profile. Document spamming is not a legal activity, and thus, there are many algorithms are derived by the research authors to avoid such spamming. Challenges behind such approach are processing the data and finding accuracy over the similarity detection. In this paper, a novel QAP-based Rabin-Karp algorithm is proposed. This approach is a combination of score computation using QAP functions and finally similarity measure computation using Rabin-Karp algorithm. The execution of experimental algorithm is performed using Java library along with sample documents. Algorithm is compared with traditional approach which shows the performance of proposed technique in terms of similarity measure, computation time, and throughput as parameter. The application found improvement, and hence, it shows the effectiveness of proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Mayank A, Sharma DK (2016) A state of art on source code plagiarism detection. In: 2nd international conference on next generation computing technologies (NGCT). IEEE

    Google Scholar 

  2. Mirza OM, Joy M, Cosma G (2017) Style analysis for source code plagiarism detection—an analysis of a dataset of student coursework. In: Proceedings of the 2017 IEEE 17th international conference on advanced learning technologies, pp 296–297

    Google Scholar 

  3. Kuo J-Y, Cheng H-K, Wang P-F (2018) Program plagiarism detection with dynamic structure. IEEE

    Google Scholar 

  4. Wang R, Utiyama M, Goto I, Sumita E, Zhao H, Lu B-L (2013) Converting continuous-space language models into N-gram language models for statistical machine translation. In: Proceedings of the 2013 conference on empirical methods in natural language processing, pp. 845–850

    Google Scholar 

  5. Vandana (2018) A comparative study of plagiarism detection software. IEEE, 11 Oct 2018

    Google Scholar 

  6. Pawelczak D (2018) Benefits and drawbacks of source code plagiarism detection in engineering education

    Google Scholar 

  7. Xylogiannopoulos K (2018) Text mining for plagiarism detection: multivariate pattern detection for recognition of text similarities. IEEE, 25 Oct 2018

    Google Scholar 

  8. Karnalim O, Sulistiani L (2018) Which source code plagiarism detection approach is more humane? IEEE, 01 Nov 2018

    Google Scholar 

  9. Alzahrani SM, Salim N, Abraham A (2012) Understanding plagiarism linguistic patterns, textual features, and detection methods. IEEE Trans Syst Man Cybern Part C (Appl Rev) 42(2):133–149

    Article  Google Scholar 

  10. Gipp B, Meuschke N, Beel J (2011) Comparative evaluation of text and citation-based plagiarism detection approaches using guttenplag. In: Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries. ACM, pp 255–258

    Google Scholar 

  11. Variable-Stride “Multi-pattern matching for scalable deep packet inspection”. Nan Hua College of Computing Georgia Institute of Technology, nanhua@cc.gatech.edu

    Google Scholar 

  12. Schleimer S, Wilkerson DS, Aiken A (2003) Winnowing: local algorithms for document fingerprinting. In: ACM SIGMOD

    Google Scholar 

  13. Anzelmi D, Carlone D, Rizzello F, Thomsen R, Akbar Hussain DM (2016) Plagiarism detection based on SCAM algorithm

    Google Scholar 

  14. Prasad R (2017) An efficient multi-patterns parameterized string matching algorithm with super alphabet

    Google Scholar 

  15. Bremler-Barr A, Koral HY (2016) Accelerating multi-patterns matching on compressed HTTP traffic. IEEE

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nidhi Ruthia .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ruthia, N., Tiwary, A. (2020). An Advance Approach for Spam Document Detection Using QAP Rabin-Karp Algorithm. In: Shukla, R., Agrawal, J., Sharma, S., Chaudhari, N., Shukla, K. (eds) Social Networking and Computational Intelligence. Lecture Notes in Networks and Systems, vol 100. Springer, Singapore. https://doi.org/10.1007/978-981-15-2071-6_25

Download citation

Publish with us

Policies and ethics