Design and evaluation of highly accurate smart contract code vulnerability detection framework

Jeon, Sowon; Lee, Gilhee; Kim, Hyoungshick; Woo, Simon S.

doi:10.1007/s10618-023-00981-1

Design and evaluation of highly accurate smart contract code vulnerability detection framework

Published: 13 October 2023

(2023)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Sowon Jeon¹,
Gilhee Lee²,
Hyoungshick Kim¹ &
…
Simon S. Woo¹

278 Accesses
1 Altmetric
Explore all metrics

Abstract

Smart contracts are self-executing programs stored and executed on a blockchain platform. However, previous studies demonstrated that developing secure smart contracts is not easy. Unfortunately, the use of insecure smart contracts results in a significant financial loss for service providers or customers. Therefore, identifying security vulnerabilities in smart contracts would be essential in blockchain platforms using smart contracts. In this paper, we present SmartConDetect as a tool for detecting security vulnerabilities in Solidity smart contracts. SmartConDetect is a static analysis tool that extracts code fragments from Solidity smart contracts and uses a pre-trained BERT model to find susceptible code patterns. To demonstrate the performance of SmartConDetect, we use two public datasets, and our dataset (SmartConDataset) collected from the real-world Ethereum blockchain network. Our experimental results show that SmartConDetect significantly outperforms all state-of-the-art methods, achieving 90.9% F1-score when using our own dataset. Specifically, SmartConDetect is about 2 times faster than SmartCheck in detection. Furthermore, we conduct a real-world case study to analyze the distribution of detected vulnerabilities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 9

Blockchain smart contracts: Applications, challenges, and future trends

Article 18 April 2021

Cybersecurity, Data Privacy and Blockchain: A Review

Article Open access 12 January 2022

Systematic review of SIEM technology: SIEM-SC birth

Article 02 January 2023

References

Acampora G, Cosma G (2015) A fuzzy-based approach to programming language independent source-code plagiarism detection. In: 2015 IEEE international conference on fuzzy systems (FUZZ-IEEE), pp 1–8. https://doi.org/10.1109/FUZZ-IEEE.2015.7337935
Alqarni M, Azim A (2022) Low level source code vulnerability detection using advanced BERT language model. https://assets.pubpub.org/bbi2k2lr/31652980468154.pdf. Accessed 03 Sept 2022
Ashizawa N, Yanai N, Cruz JP, Okamura S (2021) Eth2vec: learning contract-wide code representations for vulnerability detection on ethereum smart contracts. arXiv preprint arXiv:2101.02377
Atzei N, Bartoletti M, Cimoli T (2017) A survey of attacks on ethereum smart contracts sok. In: Proceedings of the 6th International conference on principles of security and trust, Springer, Berlin, Heidelberg, Vol. 10204, pp 164–186. https://doi.org/10.1007/978-3-662-54455-6_8
BeautifulSoup: beautiful soup documentation: beautiful soup 4.9.0 documentation. https://www.crummy.com/software/BeautifulSoup/bs4/doc/. Accessed 26 Jan 2022
Bodden E (2012) Inter-procedural data-flow analysis with ifds/ide and soot. In: Proceedings of the ACM SIGPLAN international workshop on state of the art in java program analysis. SOAP ’12. Association for Computing Machinery, New York, pp 3–8. https://doi.org/10.1145/2259051.2259052
Buratti L, Pujar S, Bornea M, McCarley S, Zheng Y, Rossiello G, Morari A, Laredo J, Thost V, Zhuang Y, et al (2020) Exploring software naturalness through neural language models. arXiv preprint arXiv:2006.12641
Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805
Ding SHH (2021) Kam1n0 Server. McGill University. https://github.com/McGill-DMaS/Kam1n0-Community
Durieux T, Ferreira JF, Abreu R, Cruz P (2020) Empirical review of automated analysis tools on 47,587 ethereum smart contracts. In: Proceedings of the ACM/IEEE 42nd international conference on software engineering, pp 530–541
Ethereum (2021) Gas and fees at ethereum development documentation. https://ethereum.org/ko/developers/docs/gas/. Accessed 1 Dec 2021
Feng Z, Guo D, Tang D, Duan N, Feng X, Gong M, Shou L, Qin B, Liu T, Jiang D, et al (2020) Codebert: a pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155
GitHub SD (2021) Solidity. Ethereum. https://docs.soliditylang.org/en/v0.8.4/
Guo D, Lu S, Duan N, Wang Y, Zhou M, Yin J (2022) UniXcoder: unified cross-modal pre-training for code representation. arXiv. https://doi.org/10.48550/ARXIV.2203.03850. arXiv:https://arxiv.org/abs/2203.03850
Guo D, Ren S, Lu S, Feng Z, Tang D, LIU S, Zhou L, Duan N, Svyatkovskiy A, Fu S, Tufano M, Deng SK, Clement C, Drain D, Sundaresan N, Yin J, Jiang D, Zhou M (2021) GraphCodeBERT: pre-training code representations with data flow. In: International conference on learning representations. https://openreview.net/forum?id=jLoC4ez43PZ
Hauser B (2021) py-solc. Ethereum. https://github.com/ethereum/py-solc
Hewa T, Ylianttila M, Liyanage M (2021) Survey on blockchain based smart contracts: applications, opportunities and challenges. J Netw Comput Appl 177:102857. https://doi.org/10.1016/j.jnca.2020.102857
Article Google Scholar
Jamin S, Cheng Jin, Kurc, AR, Raz D, Shavitt Y (2020) Smart contract vulnerability detection using graph neural network. In: Proceedings of the twenty-ninth international joint conference on artificial intelligence, IJCAI 2020
Jeon S, Lee G, Kim H, Woo SS (2021) SmartConDetect: highly accurate smart contract code vulnerability detection mechanism using BERT. In: KDD workshop on programming language processing (PLP)
Jiang L, Misherghi G, Su Z, Glondu S (2007) Deckard: scalable and accurate tree-based detection of code clones. In: 29th international conference on software engineering (ICSE’07), IEEE, pp 96–105
Kanade A, Maniatis P, Balakrishnan G, Shi K (2020) Learning and evaluating contextual embedding of source code. In: International conference on machine learning, PMLR, pp 5110–5121
Lu S, Guo D, Ren S, Huang J, Svyatkovskiy A, Blanco A, Clement C, Drain D, Jiang D, Tang D, et al (2021) CodeXGLUE: a machine learning benchmark dataset for code understanding and generation. arXiv preprint arXiv:2102.04664
Lutz O, Chen H, Fereidooni H, Sendner C, Dmitrienko A, Sadeghi AR, Koushanfar F (2021) ESCORT: ethereum smart COntRacTs vulnerability detection using deep neural network and transfer learning. arXiv preprint arXiv:2103.12607
Momeni P, Wang Y, Samavi R (2019) Machine learning model for smart contracts security analysis. In: 2019 17th international conference on privacy, security and trust (PST), IEEE, pp 1–6
Palladino S (2017) The parity wallet hack explained. OpenZeppelin blog, https://blog. openzeppelin. com/on-the-parity-wallet-multisig-hack-405a8c12e8f7
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
MathSciNet MATH Google Scholar
Popper N (2016) A hacking of more than \$50 million dashes hopes in the world of virtual currency. The New York Times
Reproducibility: PyTorch 1.13 documentation. https://pytorch.org/docs/stable/notes/randomness.html. Accessed 24 Dec 2022
Russell R, Kim L, Hamilton L, Lazovich T, Harer J, Ozdemir O, Ellingwood P, McConley M (2018) Automated vulnerability detection in source code using deep representation learning. In: 2018 17th IEEE international conference on machine learning and applications (ICMLA), pp 757–762. https://doi.org/10.1109/ICMLA.2018.00120
SafeMath (2017) https://github.com/OpenZeppelin/zeppelin-solidity/blob/master/contracts/math/SafeMath.sol
Samreen NF, Alalfi MH (2021) A survey of security vulnerabilities in ethereum smart contracts. arXiv preprint arXiv:2105.06974
Solidity v0.5.0 Breaking Changes (2016) https://docs.soliditylang.org/en/latest/050-breaking-changes.html#
Swathi B, Anju R (2019) Reformulation of natural language queries on source code base using NLP techniques. Compusoft 8(2):3047–3052
Google Scholar
Szabo N (1997) Formalizing and securing relationships on public networks. First Monday
Team TE (2021) Ethereum (ETH) blockchain explorer. https://etherscan.io/. (Accessed on 05/21/2021)
Tikhomirov S, Voskresenskaya E, Ivanitskiy I, Takhaviev R, Marchenko E, Alexandrov Y (2018) Smartcheck: static analysis of ethereum smart contracts. In: 2018 IEEE/ACM 1st international workshop on emerging trends in software engineering for blockchain (WETSEB), pp 9–16
van Dam JK (2016) Identifying source code programming languages through natural language processing. PhD thesis, MS thesis, Faculty Sci., Math. Inform., Univ. Amsterdam, Amsterdam
Wang W, Song J, Xu G, Li Y, Wang H, Su C (2020) Contractward: automated vulnerability detection models for ethereum smart contracts. IEEE Trans Netw Sci Eng 8(2):1133–1144
Article Google Scholar
Wang X, Wang Y, Mi F, Zhou P, Wan Y, Liu X, Li L, Wu H, Liu J, Jiang X (2021) SynCoBERT: syntax-guided multi-modal contrastive pre-training for code representation. arXiv. https://doi.org/10.48550/ARXIV.2108.04556. arXiv:https://arxiv.org/abs/2108.04556
Wu J (2021) Literature review on vulnerability detection using NLP technology
Yin P, Neubig G (2017) A syntactic neural model for general-purpose code generation. arXiv preprint arXiv:1704.01696

Download references

Acknowledgements

We thank the anonymous reviewers and editor for providing invaluable comments and feedback, which greatly improves the current work. This work was partly supported by Institute for Information & communication Technology Planning & evaluation (IITP) grants funded by the Korean government MSIT: (No. 2022-0-01199, Graduate School of Convergence Security at Sungkyunkwan University), (No. 2022-0-01045, Self-directed Multi-Modal Intelligence for solving unknown, open domain problems), (No. 2022-0-00688, AI Platform to Fully Adapt and Reflect Privacy-Policy Changes), (No. 2021-0-02068, Artificial Intelligence Innovation Hub), (No. 2019-0-00421, AI Graduate School Support Program at Sungkyunkwan University), and (No. RS-2023-00230337, Advanced and Proactive AI Platform Research and Development Against Malicious deepfakes).

Author information

Authors and Affiliations

College of Computing and Informatics, Sungkyunkwan University, Seoburo, Suwon, Gyeonggi-do, 16419, South Korea
Sowon Jeon, Hyoungshick Kim & Simon S. Woo
Department of Electrical and Computer Engineering, Sungkyunkwan University, Seoburo, Suwon, Gyeonggi-do, 16419, South Korea
Gilhee Lee

Authors

Sowon Jeon
View author publications
You can also search for this author in PubMed Google Scholar
Gilhee Lee
View author publications
You can also search for this author in PubMed Google Scholar
Hyoungshick Kim
View author publications
You can also search for this author in PubMed Google Scholar
Simon S. Woo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Hyoungshick Kim or Simon S. Woo.

Ethics declarations

Conflict of interest

The authors have declared no conflicts of interest for this article.

Additional information

Responsible editor: Johannes Fürnkranz.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Jeon, S., Lee, G., Kim, H. et al. Design and evaluation of highly accurate smart contract code vulnerability detection framework. Data Min Knowl Disc (2023). https://doi.org/10.1007/s10618-023-00981-1

Download citation

Received: 01 December 2021
Accepted: 04 September 2023
Published: 13 October 2023
DOI: https://doi.org/10.1007/s10618-023-00981-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Design and evaluation of highly accurate smart contract code vulnerability detection framework

Abstract

Access this article

Similar content being viewed by others

Blockchain smart contracts: Applications, challenges, and future trends

Cybersecurity, Data Privacy and Blockchain: A Review

Systematic review of SIEM technology: SIEM-SC birth

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Design and evaluation of highly accurate smart contract code vulnerability detection framework

Abstract

Access this article

Similar content being viewed by others

Blockchain smart contracts: Applications, challenges, and future trends

Cybersecurity, Data Privacy and Blockchain: A Review

Systematic review of SIEM technology: SIEM-SC birth

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation