Abstract
Smart contracts are self-executing programs stored and executed on a blockchain platform. However, previous studies demonstrated that developing secure smart contracts is not easy. Unfortunately, the use of insecure smart contracts results in a significant financial loss for service providers or customers. Therefore, identifying security vulnerabilities in smart contracts would be essential in blockchain platforms using smart contracts. In this paper, we present SmartConDetect as a tool for detecting security vulnerabilities in Solidity smart contracts. SmartConDetect is a static analysis tool that extracts code fragments from Solidity smart contracts and uses a pre-trained BERT model to find susceptible code patterns. To demonstrate the performance of SmartConDetect, we use two public datasets, and our dataset (SmartConDataset) collected from the real-world Ethereum blockchain network. Our experimental results show that SmartConDetect significantly outperforms all state-of-the-art methods, achieving 90.9% F1-score when using our own dataset. Specifically, SmartConDetect is about 2 times faster than SmartCheck in detection. Furthermore, we conduct a real-world case study to analyze the distribution of detected vulnerabilities.
Similar content being viewed by others
References
Acampora G, Cosma G (2015) A fuzzy-based approach to programming language independent source-code plagiarism detection. In: 2015 IEEE international conference on fuzzy systems (FUZZ-IEEE), pp 1–8. https://doi.org/10.1109/FUZZ-IEEE.2015.7337935
Alqarni M, Azim A (2022) Low level source code vulnerability detection using advanced BERT language model. https://assets.pubpub.org/bbi2k2lr/31652980468154.pdf. Accessed 03 Sept 2022
Ashizawa N, Yanai N, Cruz JP, Okamura S (2021) Eth2vec: learning contract-wide code representations for vulnerability detection on ethereum smart contracts. arXiv preprint arXiv:2101.02377
Atzei N, Bartoletti M, Cimoli T (2017) A survey of attacks on ethereum smart contracts sok. In: Proceedings of the 6th International conference on principles of security and trust, Springer, Berlin, Heidelberg, Vol. 10204, pp 164–186. https://doi.org/10.1007/978-3-662-54455-6_8
BeautifulSoup: beautiful soup documentation: beautiful soup 4.9.0 documentation. https://www.crummy.com/software/BeautifulSoup/bs4/doc/. Accessed 26 Jan 2022
Bodden E (2012) Inter-procedural data-flow analysis with ifds/ide and soot. In: Proceedings of the ACM SIGPLAN international workshop on state of the art in java program analysis. SOAP ’12. Association for Computing Machinery, New York, pp 3–8. https://doi.org/10.1145/2259051.2259052
Buratti L, Pujar S, Bornea M, McCarley S, Zheng Y, Rossiello G, Morari A, Laredo J, Thost V, Zhuang Y, et al (2020) Exploring software naturalness through neural language models. arXiv preprint arXiv:2006.12641
Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805
Ding SHH (2021) Kam1n0 Server. McGill University. https://github.com/McGill-DMaS/Kam1n0-Community
Durieux T, Ferreira JF, Abreu R, Cruz P (2020) Empirical review of automated analysis tools on 47,587 ethereum smart contracts. In: Proceedings of the ACM/IEEE 42nd international conference on software engineering, pp 530–541
Ethereum (2021) Gas and fees at ethereum development documentation. https://ethereum.org/ko/developers/docs/gas/. Accessed 1 Dec 2021
Feng Z, Guo D, Tang D, Duan N, Feng X, Gong M, Shou L, Qin B, Liu T, Jiang D, et al (2020) Codebert: a pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155
GitHub SD (2021) Solidity. Ethereum. https://docs.soliditylang.org/en/v0.8.4/
Guo D, Lu S, Duan N, Wang Y, Zhou M, Yin J (2022) UniXcoder: unified cross-modal pre-training for code representation. arXiv. https://doi.org/10.48550/ARXIV.2203.03850. arXiv:https://arxiv.org/abs/2203.03850
Guo D, Ren S, Lu S, Feng Z, Tang D, LIU S, Zhou L, Duan N, Svyatkovskiy A, Fu S, Tufano M, Deng SK, Clement C, Drain D, Sundaresan N, Yin J, Jiang D, Zhou M (2021) GraphCodeBERT: pre-training code representations with data flow. In: International conference on learning representations. https://openreview.net/forum?id=jLoC4ez43PZ
Hauser B (2021) py-solc. Ethereum. https://github.com/ethereum/py-solc
Hewa T, Ylianttila M, Liyanage M (2021) Survey on blockchain based smart contracts: applications, opportunities and challenges. J Netw Comput Appl 177:102857. https://doi.org/10.1016/j.jnca.2020.102857
Jamin S, Cheng Jin, Kurc, AR, Raz D, Shavitt Y (2020) Smart contract vulnerability detection using graph neural network. In: Proceedings of the twenty-ninth international joint conference on artificial intelligence, IJCAI 2020
Jeon S, Lee G, Kim H, Woo SS (2021) SmartConDetect: highly accurate smart contract code vulnerability detection mechanism using BERT. In: KDD workshop on programming language processing (PLP)
Jiang L, Misherghi G, Su Z, Glondu S (2007) Deckard: scalable and accurate tree-based detection of code clones. In: 29th international conference on software engineering (ICSE’07), IEEE, pp 96–105
Kanade A, Maniatis P, Balakrishnan G, Shi K (2020) Learning and evaluating contextual embedding of source code. In: International conference on machine learning, PMLR, pp 5110–5121
Lu S, Guo D, Ren S, Huang J, Svyatkovskiy A, Blanco A, Clement C, Drain D, Jiang D, Tang D, et al (2021) CodeXGLUE: a machine learning benchmark dataset for code understanding and generation. arXiv preprint arXiv:2102.04664
Lutz O, Chen H, Fereidooni H, Sendner C, Dmitrienko A, Sadeghi AR, Koushanfar F (2021) ESCORT: ethereum smart COntRacTs vulnerability detection using deep neural network and transfer learning. arXiv preprint arXiv:2103.12607
Momeni P, Wang Y, Samavi R (2019) Machine learning model for smart contracts security analysis. In: 2019 17th international conference on privacy, security and trust (PST), IEEE, pp 1–6
Palladino S (2017) The parity wallet hack explained. OpenZeppelin blog, https://blog. openzeppelin. com/on-the-parity-wallet-multisig-hack-405a8c12e8f7
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
Popper N (2016) A hacking of more than \$50 million dashes hopes in the world of virtual currency. The New York Times
Reproducibility: PyTorch 1.13 documentation. https://pytorch.org/docs/stable/notes/randomness.html. Accessed 24 Dec 2022
Russell R, Kim L, Hamilton L, Lazovich T, Harer J, Ozdemir O, Ellingwood P, McConley M (2018) Automated vulnerability detection in source code using deep representation learning. In: 2018 17th IEEE international conference on machine learning and applications (ICMLA), pp 757–762. https://doi.org/10.1109/ICMLA.2018.00120
SafeMath (2017) https://github.com/OpenZeppelin/zeppelin-solidity/blob/master/contracts/math/SafeMath.sol
Samreen NF, Alalfi MH (2021) A survey of security vulnerabilities in ethereum smart contracts. arXiv preprint arXiv:2105.06974
Solidity v0.5.0 Breaking Changes (2016) https://docs.soliditylang.org/en/latest/050-breaking-changes.html#
Swathi B, Anju R (2019) Reformulation of natural language queries on source code base using NLP techniques. Compusoft 8(2):3047–3052
Szabo N (1997) Formalizing and securing relationships on public networks. First Monday
Team TE (2021) Ethereum (ETH) blockchain explorer. https://etherscan.io/. (Accessed on 05/21/2021)
Tikhomirov S, Voskresenskaya E, Ivanitskiy I, Takhaviev R, Marchenko E, Alexandrov Y (2018) Smartcheck: static analysis of ethereum smart contracts. In: 2018 IEEE/ACM 1st international workshop on emerging trends in software engineering for blockchain (WETSEB), pp 9–16
van Dam JK (2016) Identifying source code programming languages through natural language processing. PhD thesis, MS thesis, Faculty Sci., Math. Inform., Univ. Amsterdam, Amsterdam
Wang W, Song J, Xu G, Li Y, Wang H, Su C (2020) Contractward: automated vulnerability detection models for ethereum smart contracts. IEEE Trans Netw Sci Eng 8(2):1133–1144
Wang X, Wang Y, Mi F, Zhou P, Wan Y, Liu X, Li L, Wu H, Liu J, Jiang X (2021) SynCoBERT: syntax-guided multi-modal contrastive pre-training for code representation. arXiv. https://doi.org/10.48550/ARXIV.2108.04556. arXiv:https://arxiv.org/abs/2108.04556
Wu J (2021) Literature review on vulnerability detection using NLP technology
Yin P, Neubig G (2017) A syntactic neural model for general-purpose code generation. arXiv preprint arXiv:1704.01696
Acknowledgements
We thank the anonymous reviewers and editor for providing invaluable comments and feedback, which greatly improves the current work. This work was partly supported by Institute for Information & communication Technology Planning & evaluation (IITP) grants funded by the Korean government MSIT: (No. 2022-0-01199, Graduate School of Convergence Security at Sungkyunkwan University), (No. 2022-0-01045, Self-directed Multi-Modal Intelligence for solving unknown, open domain problems), (No. 2022-0-00688, AI Platform to Fully Adapt and Reflect Privacy-Policy Changes), (No. 2021-0-02068, Artificial Intelligence Innovation Hub), (No. 2019-0-00421, AI Graduate School Support Program at Sungkyunkwan University), and (No. RS-2023-00230337, Advanced and Proactive AI Platform Research and Development Against Malicious deepfakes).
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
The authors have declared no conflicts of interest for this article.
Additional information
Responsible editor: Johannes Fürnkranz.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Jeon, S., Lee, G., Kim, H. et al. Design and evaluation of highly accurate smart contract code vulnerability detection framework. Data Min Knowl Disc (2023). https://doi.org/10.1007/s10618-023-00981-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10618-023-00981-1