Skip to main content

OCR Meets the Dark Web: Identifying the Content Type Regarding Illegal and Cybercrime

  • Conference paper
  • First Online:
Information Security Applications (WISA 2023)

Abstract

The dark web provides features such as encryption and routing changes to ensure anonymity and make tracking difficult. Cybercrimes exploit the characteristics to gain revenue by distributing illegal and cybercrime content through the dark web and take a financial benefit as a business strategy. Illegal and cybercrime content includes drug and arms trafficking, counterfeit documents, malware, and the sale of personal information. A text crawling system in dark web has been developed and researched to counter illegal and cybercrime content distribution. However, because traditional text crawler in the dark web collects all text, identifying the exact data type can be difficult if dark web pages serve different types of illegal and cybercrime content. In this paper, we propose a method of using the text embedded within images to accurately identify the types of illegal and cybercrime content on the dark web. We conducted the experiments with a combination of text and texts from both web page and images to accurately identify illegal and cybercrime content types. We collected keywords for the three types of illegal and cybercrime content. The distribution and types of illegal and cybercrime content were identified by calculating whether the collected keywords were included in dark web pages. Through experiments, we confirmed that using text embedded within images improves performance. Our proposed method accurately identified over 90% of dark web pages where drugs were distributed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Kaur, S., Randhawa, S.: Dark web: a web of crimes. Wirel. Pers. Commun. 112, 2131–2158 (2020)

    Article  Google Scholar 

  2. He, S., He, Y., Li, M.: Classification of illegal activities on the dark web. In: Proceedings of the 2nd International Conference on Information Science and Systems, Tokyo, Japan, pp. 73–78 (2019)

    Google Scholar 

  3. Rawat, R., Rajawat, A.S., Mahor, V., Shaw, R.N., Ghosh, A.: Dark web—onion hidden service discovery and crawling for profiling morphing, unstructured crime and vulnerabilities prediction. In: Mekhilef, S., Favorskaya, M., Pandey, R.K., Shaw, R.N. (eds.) Innovations in Electrical and Electronic Engineering. LNEE, vol. 756, pp. 717–734. Springer, Singapore (2021). https://doi.org/10.1007/978-981-16-0749-3_57

    Chapter  Google Scholar 

  4. Laferrière, D., Décary-Hétu, D.: Examining the uncharted dark web: trust signalling on single vendor shops. Deviant Behav. 44(1), 37–56 (2023)

    Article  Google Scholar 

  5. Turk, K., Pastrana, S., Collier, B.: A tight scrape: methodological approaches to cybercrime research data collection in adversarial environments. In: 2020 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), Genoa, Italy, pp. 428–437. IEEE (2020)

    Google Scholar 

  6. Faizan, M., Khan, R.A.: Exploring and analyzing the dark web: a new alchemy. First Monday (2019)

    Google Scholar 

  7. Medina, P.B., Fernández, E.F., Gutiérrez, E.A., Al Nabki, M.W.: Detecting textual information in images from onion domains using text spotting. In: XXXIX Jornadas de Automática: actas, Badajoz, 5–7 de septiembre de 2018, pp. 975–982. Universidad de Extremadura (2018)

    Google Scholar 

  8. Dalvi, A., Paranjpe, S., Amale, R., Kurumkar, S., Kazi, F., Bhirud, S.G.: SpyDark: surface and dark web crawler. In: 2021 2nd International Conference on Secure Cyber Computing and Communications (ICSCCC), Jalandhar, India, pp. 45–49. IEEE (2021)

    Google Scholar 

  9. Huang, C., Fu, T., Chen, H.: Text-based video content classification for online video-sharing sites. J. Am. Soc. Inform. Sci. Technol. 61(5), 891–906 (2010)

    Article  Google Scholar 

  10. Nguyen, T.T.H., Jatowt, A., Coustaty, M., Doucet, A.: Survey of post-OCR processing approaches. ACM Comput. Surv. (CSUR) 54(6), 1–37 (2021)

    Article  Google Scholar 

  11. Mittal, R., Garg, A.: Text extraction using OCR: a systematic review. In: 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA), Coimbatore, India, pp. 357–362. IEEE (2020)

    Google Scholar 

  12. Alaidi, A.H.M., Roa’a, M., ALRikabi, H.T.S., Aljazaery, I.A., Abbood, S.H.: Dark web illegal activities crawling and classifying using data mining techniques. iJIM 16(10), 123 (2022)

    Google Scholar 

  13. Jeziorowski, S., Ismail, M., Siraj, A.: Towards image-based dark vendor profiling: an analysis of image metadata and image hashing in dark web marketplaces. In: Proceedings of the Sixth International Workshop on Security and Privacy Analytics, New Orleans, LA, USA, pp. 15–22 (2020)

    Google Scholar 

  14. Pannu, M., Kay, I., Harris, D.: Using dark web crawler to uncover suspicious and malicious websites. In: Ahram, T.Z., Nicholson, D. (eds.) AHFE 2018. AISC, vol. 782, pp. 108–115. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-94782-2_11

    Chapter  Google Scholar 

  15. Kavallieros, D., Myttas, D., Kermitsis, E., Lissaris, E., Giataganas, G., Darra, E.: Understanding the dark web. In: Akhgar, B., Gercke, M., Vrochidis, S., Gibson, H. (eds.) Dark Web Investigation. SILE, pp. 3–26. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-55343-2_1

    Chapter  Google Scholar 

  16. Bergman, J., Popov, O.B.: Exploring dark web crawlers: a systematic literature review of dark web crawlers and their implementation. IEEE Access (2023)

    Google Scholar 

  17. Thammarak, K., Kongkla, P., Sirisathitkul, Y., Intakosum, S.: Comparative analysis of Tesseract and Google Cloud Vision for Thai vehicle registration certificate. Int. J. Electr. Comput. Eng. 12(2), 1849–1858 (2022)

    Google Scholar 

Download references

Acknowledgement

This work was supported by the Nuclear Safety Research Program through the Korea Foundation of Nuclear Safety (KoFONS) using the financial resource granted by the Nuclear Safety and Security Commission (NSSC) of the Republic of Korea (No. 2106058).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jung Taek Seo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kim, D., Jeon, S., Shin, J., Seo, J.T. (2024). OCR Meets the Dark Web: Identifying the Content Type Regarding Illegal and Cybercrime. In: Kim, H., Youn, J. (eds) Information Security Applications. WISA 2023. Lecture Notes in Computer Science, vol 14402. Springer, Singapore. https://doi.org/10.1007/978-981-99-8024-6_16

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8024-6_16

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8023-9

  • Online ISBN: 978-981-99-8024-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics