Abstract
The dark web provides features such as encryption and routing changes to ensure anonymity and make tracking difficult. Cybercrimes exploit the characteristics to gain revenue by distributing illegal and cybercrime content through the dark web and take a financial benefit as a business strategy. Illegal and cybercrime content includes drug and arms trafficking, counterfeit documents, malware, and the sale of personal information. A text crawling system in dark web has been developed and researched to counter illegal and cybercrime content distribution. However, because traditional text crawler in the dark web collects all text, identifying the exact data type can be difficult if dark web pages serve different types of illegal and cybercrime content. In this paper, we propose a method of using the text embedded within images to accurately identify the types of illegal and cybercrime content on the dark web. We conducted the experiments with a combination of text and texts from both web page and images to accurately identify illegal and cybercrime content types. We collected keywords for the three types of illegal and cybercrime content. The distribution and types of illegal and cybercrime content were identified by calculating whether the collected keywords were included in dark web pages. Through experiments, we confirmed that using text embedded within images improves performance. Our proposed method accurately identified over 90% of dark web pages where drugs were distributed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Kaur, S., Randhawa, S.: Dark web: a web of crimes. Wirel. Pers. Commun. 112, 2131–2158 (2020)
He, S., He, Y., Li, M.: Classification of illegal activities on the dark web. In: Proceedings of the 2nd International Conference on Information Science and Systems, Tokyo, Japan, pp. 73–78 (2019)
Rawat, R., Rajawat, A.S., Mahor, V., Shaw, R.N., Ghosh, A.: Dark web—onion hidden service discovery and crawling for profiling morphing, unstructured crime and vulnerabilities prediction. In: Mekhilef, S., Favorskaya, M., Pandey, R.K., Shaw, R.N. (eds.) Innovations in Electrical and Electronic Engineering. LNEE, vol. 756, pp. 717–734. Springer, Singapore (2021). https://doi.org/10.1007/978-981-16-0749-3_57
Laferrière, D., Décary-Hétu, D.: Examining the uncharted dark web: trust signalling on single vendor shops. Deviant Behav. 44(1), 37–56 (2023)
Turk, K., Pastrana, S., Collier, B.: A tight scrape: methodological approaches to cybercrime research data collection in adversarial environments. In: 2020 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), Genoa, Italy, pp. 428–437. IEEE (2020)
Faizan, M., Khan, R.A.: Exploring and analyzing the dark web: a new alchemy. First Monday (2019)
Medina, P.B., Fernández, E.F., Gutiérrez, E.A., Al Nabki, M.W.: Detecting textual information in images from onion domains using text spotting. In: XXXIX Jornadas de Automática: actas, Badajoz, 5–7 de septiembre de 2018, pp. 975–982. Universidad de Extremadura (2018)
Dalvi, A., Paranjpe, S., Amale, R., Kurumkar, S., Kazi, F., Bhirud, S.G.: SpyDark: surface and dark web crawler. In: 2021 2nd International Conference on Secure Cyber Computing and Communications (ICSCCC), Jalandhar, India, pp. 45–49. IEEE (2021)
Huang, C., Fu, T., Chen, H.: Text-based video content classification for online video-sharing sites. J. Am. Soc. Inform. Sci. Technol. 61(5), 891–906 (2010)
Nguyen, T.T.H., Jatowt, A., Coustaty, M., Doucet, A.: Survey of post-OCR processing approaches. ACM Comput. Surv. (CSUR) 54(6), 1–37 (2021)
Mittal, R., Garg, A.: Text extraction using OCR: a systematic review. In: 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA), Coimbatore, India, pp. 357–362. IEEE (2020)
Alaidi, A.H.M., Roa’a, M., ALRikabi, H.T.S., Aljazaery, I.A., Abbood, S.H.: Dark web illegal activities crawling and classifying using data mining techniques. iJIM 16(10), 123 (2022)
Jeziorowski, S., Ismail, M., Siraj, A.: Towards image-based dark vendor profiling: an analysis of image metadata and image hashing in dark web marketplaces. In: Proceedings of the Sixth International Workshop on Security and Privacy Analytics, New Orleans, LA, USA, pp. 15–22 (2020)
Pannu, M., Kay, I., Harris, D.: Using dark web crawler to uncover suspicious and malicious websites. In: Ahram, T.Z., Nicholson, D. (eds.) AHFE 2018. AISC, vol. 782, pp. 108–115. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-94782-2_11
Kavallieros, D., Myttas, D., Kermitsis, E., Lissaris, E., Giataganas, G., Darra, E.: Understanding the dark web. In: Akhgar, B., Gercke, M., Vrochidis, S., Gibson, H. (eds.) Dark Web Investigation. SILE, pp. 3–26. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-55343-2_1
Bergman, J., Popov, O.B.: Exploring dark web crawlers: a systematic literature review of dark web crawlers and their implementation. IEEE Access (2023)
Thammarak, K., Kongkla, P., Sirisathitkul, Y., Intakosum, S.: Comparative analysis of Tesseract and Google Cloud Vision for Thai vehicle registration certificate. Int. J. Electr. Comput. Eng. 12(2), 1849–1858 (2022)
Acknowledgement
This work was supported by the Nuclear Safety Research Program through the Korea Foundation of Nuclear Safety (KoFONS) using the financial resource granted by the Nuclear Safety and Security Commission (NSSC) of the Republic of Korea (No. 2106058).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Kim, D., Jeon, S., Shin, J., Seo, J.T. (2024). OCR Meets the Dark Web: Identifying the Content Type Regarding Illegal and Cybercrime. In: Kim, H., Youn, J. (eds) Information Security Applications. WISA 2023. Lecture Notes in Computer Science, vol 14402. Springer, Singapore. https://doi.org/10.1007/978-981-99-8024-6_16
Download citation
DOI: https://doi.org/10.1007/978-981-99-8024-6_16
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8023-9
Online ISBN: 978-981-99-8024-6
eBook Packages: Computer ScienceComputer Science (R0)