Abstract
Document spam is the term which is related to the document copyright issue. It deals with the plagiarism of content from a genuine copy to another. Many researches are performed around the globe for the sake of improvement in different fields such as medical, technology, and agriculture The original content makes impact on the current scenario improvement. Many organization and individual make use of existing concepts to take credit of others’ work in their profile. Document spamming is not a legal activity, and thus, there are many algorithms are derived by the research authors to avoid such spamming. Challenges behind such approach are processing the data and finding accuracy over the similarity detection. In this paper, a novel QAP-based Rabin-Karp algorithm is proposed. This approach is a combination of score computation using QAP functions and finally similarity measure computation using Rabin-Karp algorithm. The execution of experimental algorithm is performed using Java library along with sample documents. Algorithm is compared with traditional approach which shows the performance of proposed technique in terms of similarity measure, computation time, and throughput as parameter. The application found improvement, and hence, it shows the effectiveness of proposed approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Mayank A, Sharma DK (2016) A state of art on source code plagiarism detection. In: 2nd international conference on next generation computing technologies (NGCT). IEEE
Mirza OM, Joy M, Cosma G (2017) Style analysis for source code plagiarism detection—an analysis of a dataset of student coursework. In: Proceedings of the 2017 IEEE 17th international conference on advanced learning technologies, pp 296–297
Kuo J-Y, Cheng H-K, Wang P-F (2018) Program plagiarism detection with dynamic structure. IEEE
Wang R, Utiyama M, Goto I, Sumita E, Zhao H, Lu B-L (2013) Converting continuous-space language models into N-gram language models for statistical machine translation. In: Proceedings of the 2013 conference on empirical methods in natural language processing, pp. 845–850
Vandana (2018) A comparative study of plagiarism detection software. IEEE, 11 Oct 2018
Pawelczak D (2018) Benefits and drawbacks of source code plagiarism detection in engineering education
Xylogiannopoulos K (2018) Text mining for plagiarism detection: multivariate pattern detection for recognition of text similarities. IEEE, 25 Oct 2018
Karnalim O, Sulistiani L (2018) Which source code plagiarism detection approach is more humane? IEEE, 01 Nov 2018
Alzahrani SM, Salim N, Abraham A (2012) Understanding plagiarism linguistic patterns, textual features, and detection methods. IEEE Trans Syst Man Cybern Part C (Appl Rev) 42(2):133–149
Gipp B, Meuschke N, Beel J (2011) Comparative evaluation of text and citation-based plagiarism detection approaches using guttenplag. In: Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries. ACM, pp 255–258
Variable-Stride “Multi-pattern matching for scalable deep packet inspection”. Nan Hua College of Computing Georgia Institute of Technology, nanhua@cc.gatech.edu
Schleimer S, Wilkerson DS, Aiken A (2003) Winnowing: local algorithms for document fingerprinting. In: ACM SIGMOD
Anzelmi D, Carlone D, Rizzello F, Thomsen R, Akbar Hussain DM (2016) Plagiarism detection based on SCAM algorithm
Prasad R (2017) An efficient multi-patterns parameterized string matching algorithm with super alphabet
Bremler-Barr A, Koral HY (2016) Accelerating multi-patterns matching on compressed HTTP traffic. IEEE
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Ruthia, N., Tiwary, A. (2020). An Advance Approach for Spam Document Detection Using QAP Rabin-Karp Algorithm. In: Shukla, R., Agrawal, J., Sharma, S., Chaudhari, N., Shukla, K. (eds) Social Networking and Computational Intelligence. Lecture Notes in Networks and Systems, vol 100. Springer, Singapore. https://doi.org/10.1007/978-981-15-2071-6_25
Download citation
DOI: https://doi.org/10.1007/978-981-15-2071-6_25
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-2070-9
Online ISBN: 978-981-15-2071-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)