Skip to main content
Log in

Design and evaluation of highly accurate smart contract code vulnerability detection framework

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Smart contracts are self-executing programs stored and executed on a blockchain platform. However, previous studies demonstrated that developing secure smart contracts is not easy. Unfortunately, the use of insecure smart contracts results in a significant financial loss for service providers or customers. Therefore, identifying security vulnerabilities in smart contracts would be essential in blockchain platforms using smart contracts. In this paper, we present SmartConDetect as a tool for detecting security vulnerabilities in Solidity smart contracts. SmartConDetect is a static analysis tool that extracts code fragments from Solidity smart contracts and uses a pre-trained BERT model to find susceptible code patterns. To demonstrate the performance of SmartConDetect, we use two public datasets, and our dataset (SmartConDataset) collected from the real-world Ethereum blockchain network. Our experimental results show that SmartConDetect significantly outperforms all state-of-the-art methods, achieving 90.9% F1-score when using our own dataset. Specifically, SmartConDetect is about 2 times faster than SmartCheck in detection. Furthermore, we conduct a real-world case study to analyze the distribution of detected vulnerabilities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  • Acampora G, Cosma G (2015) A fuzzy-based approach to programming language independent source-code plagiarism detection. In: 2015 IEEE international conference on fuzzy systems (FUZZ-IEEE), pp 1–8. https://doi.org/10.1109/FUZZ-IEEE.2015.7337935

  • Alqarni M, Azim A (2022) Low level source code vulnerability detection using advanced BERT language model. https://assets.pubpub.org/bbi2k2lr/31652980468154.pdf. Accessed 03 Sept 2022

  • Ashizawa N, Yanai N, Cruz JP, Okamura S (2021) Eth2vec: learning contract-wide code representations for vulnerability detection on ethereum smart contracts. arXiv preprint arXiv:2101.02377

  • Atzei N, Bartoletti M, Cimoli T (2017) A survey of attacks on ethereum smart contracts sok. In: Proceedings of the 6th International conference on principles of security and trust, Springer, Berlin, Heidelberg, Vol. 10204, pp 164–186. https://doi.org/10.1007/978-3-662-54455-6_8

  • BeautifulSoup: beautiful soup documentation: beautiful soup 4.9.0 documentation. https://www.crummy.com/software/BeautifulSoup/bs4/doc/. Accessed 26 Jan 2022

  • Bodden E (2012) Inter-procedural data-flow analysis with ifds/ide and soot. In: Proceedings of the ACM SIGPLAN international workshop on state of the art in java program analysis. SOAP ’12. Association for Computing Machinery, New York, pp 3–8. https://doi.org/10.1145/2259051.2259052

  • Buratti L, Pujar S, Bornea M, McCarley S, Zheng Y, Rossiello G, Morari A, Laredo J, Thost V, Zhuang Y, et al (2020) Exploring software naturalness through neural language models. arXiv preprint arXiv:2006.12641

  • Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805

  • Ding SHH (2021) Kam1n0 Server. McGill University. https://github.com/McGill-DMaS/Kam1n0-Community

  • Durieux T, Ferreira JF, Abreu R, Cruz P (2020) Empirical review of automated analysis tools on 47,587 ethereum smart contracts. In: Proceedings of the ACM/IEEE 42nd international conference on software engineering, pp 530–541

  • Ethereum (2021) Gas and fees at ethereum development documentation. https://ethereum.org/ko/developers/docs/gas/. Accessed 1 Dec 2021

  • Feng Z, Guo D, Tang D, Duan N, Feng X, Gong M, Shou L, Qin B, Liu T, Jiang D, et al (2020) Codebert: a pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155

  • GitHub SD (2021) Solidity. Ethereum. https://docs.soliditylang.org/en/v0.8.4/

  • Guo D, Lu S, Duan N, Wang Y, Zhou M, Yin J (2022) UniXcoder: unified cross-modal pre-training for code representation. arXiv. https://doi.org/10.48550/ARXIV.2203.03850. arXiv:https://arxiv.org/abs/2203.03850

  • Guo D, Ren S, Lu S, Feng Z, Tang D, LIU S, Zhou L, Duan N, Svyatkovskiy A, Fu S, Tufano M, Deng SK, Clement C, Drain D, Sundaresan N, Yin J, Jiang D, Zhou M (2021) GraphCodeBERT: pre-training code representations with data flow. In: International conference on learning representations. https://openreview.net/forum?id=jLoC4ez43PZ

  • Hauser B (2021) py-solc. Ethereum. https://github.com/ethereum/py-solc

  • Hewa T, Ylianttila M, Liyanage M (2021) Survey on blockchain based smart contracts: applications, opportunities and challenges. J Netw Comput Appl 177:102857. https://doi.org/10.1016/j.jnca.2020.102857

    Article  Google Scholar 

  • Jamin S, Cheng Jin, Kurc, AR, Raz D, Shavitt Y (2020) Smart contract vulnerability detection using graph neural network. In: Proceedings of the twenty-ninth international joint conference on artificial intelligence, IJCAI 2020

  • Jeon S, Lee G, Kim H, Woo SS (2021) SmartConDetect: highly accurate smart contract code vulnerability detection mechanism using BERT. In: KDD workshop on programming language processing (PLP)

  • Jiang L, Misherghi G, Su Z, Glondu S (2007) Deckard: scalable and accurate tree-based detection of code clones. In: 29th international conference on software engineering (ICSE’07), IEEE, pp 96–105

  • Kanade A, Maniatis P, Balakrishnan G, Shi K (2020) Learning and evaluating contextual embedding of source code. In: International conference on machine learning, PMLR, pp 5110–5121

  • Lu S, Guo D, Ren S, Huang J, Svyatkovskiy A, Blanco A, Clement C, Drain D, Jiang D, Tang D, et al (2021) CodeXGLUE: a machine learning benchmark dataset for code understanding and generation. arXiv preprint arXiv:2102.04664

  • Lutz O, Chen H, Fereidooni H, Sendner C, Dmitrienko A, Sadeghi AR, Koushanfar F (2021) ESCORT: ethereum smart COntRacTs vulnerability detection using deep neural network and transfer learning. arXiv preprint arXiv:2103.12607

  • Momeni P, Wang Y, Samavi R (2019) Machine learning model for smart contracts security analysis. In: 2019 17th international conference on privacy, security and trust (PST), IEEE, pp 1–6

  • Palladino S (2017) The parity wallet hack explained. OpenZeppelin blog, https://blog. openzeppelin. com/on-the-parity-wallet-multisig-hack-405a8c12e8f7

  • Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830

    MathSciNet  MATH  Google Scholar 

  • Popper N (2016) A hacking of more than \$50 million dashes hopes in the world of virtual currency. The New York Times

  • Reproducibility: PyTorch 1.13 documentation. https://pytorch.org/docs/stable/notes/randomness.html. Accessed 24 Dec 2022

  • Russell R, Kim L, Hamilton L, Lazovich T, Harer J, Ozdemir O, Ellingwood P, McConley M (2018) Automated vulnerability detection in source code using deep representation learning. In: 2018 17th IEEE international conference on machine learning and applications (ICMLA), pp 757–762. https://doi.org/10.1109/ICMLA.2018.00120

  • SafeMath (2017) https://github.com/OpenZeppelin/zeppelin-solidity/blob/master/contracts/math/SafeMath.sol

  • Samreen NF, Alalfi MH (2021) A survey of security vulnerabilities in ethereum smart contracts. arXiv preprint arXiv:2105.06974

  • Solidity v0.5.0 Breaking Changes (2016) https://docs.soliditylang.org/en/latest/050-breaking-changes.html#

  • Swathi B, Anju R (2019) Reformulation of natural language queries on source code base using NLP techniques. Compusoft 8(2):3047–3052

    Google Scholar 

  • Szabo N (1997) Formalizing and securing relationships on public networks. First Monday

  • Team TE (2021) Ethereum (ETH) blockchain explorer. https://etherscan.io/. (Accessed on 05/21/2021)

  • Tikhomirov S, Voskresenskaya E, Ivanitskiy I, Takhaviev R, Marchenko E, Alexandrov Y (2018) Smartcheck: static analysis of ethereum smart contracts. In: 2018 IEEE/ACM 1st international workshop on emerging trends in software engineering for blockchain (WETSEB), pp 9–16

  • van Dam JK (2016) Identifying source code programming languages through natural language processing. PhD thesis, MS thesis, Faculty Sci., Math. Inform., Univ. Amsterdam, Amsterdam

  • Wang W, Song J, Xu G, Li Y, Wang H, Su C (2020) Contractward: automated vulnerability detection models for ethereum smart contracts. IEEE Trans Netw Sci Eng 8(2):1133–1144

    Article  Google Scholar 

  • Wang X, Wang Y, Mi F, Zhou P, Wan Y, Liu X, Li L, Wu H, Liu J, Jiang X (2021) SynCoBERT: syntax-guided multi-modal contrastive pre-training for code representation. arXiv. https://doi.org/10.48550/ARXIV.2108.04556. arXiv:https://arxiv.org/abs/2108.04556

  • Wu J (2021) Literature review on vulnerability detection using NLP technology

  • Yin P, Neubig G (2017) A syntactic neural model for general-purpose code generation. arXiv preprint arXiv:1704.01696

Download references

Acknowledgements

We thank the anonymous reviewers and editor for providing invaluable comments and feedback, which greatly improves the current work. This work was partly supported by Institute for Information & communication Technology Planning & evaluation (IITP) grants funded by the Korean government MSIT: (No. 2022-0-01199, Graduate School of Convergence Security at Sungkyunkwan University), (No. 2022-0-01045, Self-directed Multi-Modal Intelligence for solving unknown, open domain problems), (No. 2022-0-00688, AI Platform to Fully Adapt and Reflect Privacy-Policy Changes), (No. 2021-0-02068, Artificial Intelligence Innovation Hub), (No. 2019-0-00421, AI Graduate School Support Program at Sungkyunkwan University), and (No. RS-2023-00230337, Advanced and Proactive AI Platform Research and Development Against Malicious deepfakes).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Hyoungshick Kim or Simon S. Woo.

Ethics declarations

Conflict of interest

The authors have declared no conflicts of interest for this article.

Additional information

Responsible editor: Johannes Fürnkranz.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jeon, S., Lee, G., Kim, H. et al. Design and evaluation of highly accurate smart contract code vulnerability detection framework. Data Min Knowl Disc (2023). https://doi.org/10.1007/s10618-023-00981-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10618-023-00981-1

Keywords

Navigation