Abstract
Blockchain technology and, in particular, blockchain-based transaction offers us information that has never been seen before in the financial world. In contrast to fiat currencies, transactions through virtual currencies like Bitcoin are completely public. And these transactions of cryptocurrencies are permanently recorded on Blockchain and are available at any time. Therefore, this allows us to build transaction networks (TN) to analyze illegal phenomenons such as phishing scams in blockchain from a network perspective. In this paper, we propose a Transaction SubGraph Network (TSGN) based classification model to identify phishing accounts in Ethereum. Firstly we extract transaction subgraphs for each address and then expand these subgraphs into corresponding TSGNs based on the different mapping mechanisms. We find that TSGNs can provide more potential information to benefit the identification of phishing accounts. Moreover, Directed-TSGNs, by introducing direction attributes, can retain the transaction flow information that captures the significant topological pattern of phishing scams. By comparing with the TSGN, Directed-TSGN indeed has much lower time complexity, benefiting the graph representation learning. Experimental results demonstrate that, combined with network representation algorithms, the TSGN model can capture more features to enhance the classification algorithm and improve phishing nodes’ identification accuracy in the Ethereum networks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Adebowale, M.A., Lwin, K.T., Sanchez, E., Hossain, M.A.: Intelligent web-phishing detection and protection scheme using integrated features of images, frames and text. Expert Syst. Appl. 115, 300–313 (2019)
Alarab, I., Prakoonwit, S., Nacer, M.I.: Competence of graph convolutional networks for anti-money laundering in bitcoin blockchain. In: Proceedings of the 2020 5th International Conference on Machine Learning Technologies, pp. 23–27 (2020)
Borgwardt, K.M., Ong, C.S., Schönauer, S., Vishwanathan, S., Smola, A.J., Kriegel, H.P.: Protein function prediction via graph kernels. Bioinformatics 21, i47–i56 (2005)
Chen, L., Peng, J., Liu, Y., Li, J., Xie, F., Zheng, Z.: Phishing scams detection in Ethereum transaction network. ACM Trans. Internet Technol. (TOIT) 21(1), 1–16 (2020)
Feng, F., Zhou, Q., Shen, Z., Yang, X., Han, L., Wang, J.Q.: The application of a novel neural network in the detection of phishing websites. J. Ambient Intell. Humanized Comput. 1–15 (2018). https://doi.org/10.1007/s12652-018-0786-3
Fu, C., et al.: Link weight prediction using supervised learning methods and its application to yelp layered network. IEEE Trans. Knowl. Data Eng. 30(8), 1507–1518 (2018)
Gualberto, E.S., De Sousa, R.T., Vieira, T.P.D.B., Da Costa, J.P.C.L., Duque, C.G.: The answer is in the text: multi-stage methods for phishing detection based on feature engineering. IEEE Access 8, 223529–223547 (2020)
Hosseini, M.R., Maghrebi, M., Akbarnezhad, A., Martek, I., Arashpour, M.: Analysis of citation networks in building information modeling research. J. Constr. Eng. Manage. 144(8), 04018064 (2018)
Khonji, M., Iraqi, Y., Jones, A.: Phishing detection: a literature survey. IEEE Commun. Surv. Tutorials 15(4), 2091–2121 (2013)
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196 (2014)
Li, G., Semerci, M., Yener, B., Zaki, M.J.: Graph classification via topological and label attributes. In: Proceedings of the 9th International Workshop on Mining and Learning with Graphs (MLG), vol. 2, San Diego, USA (2011)
Liu, X., Tang, Z., Li, P., Guo, S., Fan, X., Zhang, J.: A graph learning based approach for identity inference in dapp platform blockchain. IEEE Trans. Emerg. Top. Comput. (2020)
Nakamoto, S.: Bitcoin: A peer-to-peer electronic cash system. Technical Report, Manubot (2019)
Narayanan, A., Chandramohan, M., Chen, L., Liu, Y., Saminathan, S.: subgraph2vec: Learning distributed representations of rooted sub-graphs from large graphs. In: International Workshop on Mining and Learning with Graphs (2016)
Ruan, Z., Song, C., Yang, X.H., Shen, G., Liu, Z.: Empirical analysis of urban road traffic network: a case study in Hangzhou city, china. Phys. Stat. Mech. Appl. 527, 121287 (2019)
Sahingoz, O.K., Buber, E., Demir, O., Diri, B.: Machine learning based phishing detection from URLs. Expert Syst. Appl. 117, 345–357 (2019)
Wang, J., et al.: Sampling subgraph network with application to graph classification. arXiv preprint arXiv:2102.05272 (2021)
Wu, J., et al.: Who are the phishers? phishing scam detection on Ethereum via network embedding. IEEE Trans. Syst. Man Cybern. Syst. (2020)
Xuan, Q., et al.: Subgraph networks with application to structural feature space expansion. IEEE Trans. Knowl. Data Eng. (2019). https://doi.org/10.1109/TKDE.2019.2957755
Ying, R., You, J., Morris, C., Ren, X., Hamilton, W.L., Leskovec, J.: Hierarchical graph representation learning with differentiable pooling. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 4805–4815 (2018)
Yuan, Y., Wang, F.Y.: Blockchain and cryptocurrencies: model, techniques, and applications. IEEE Trans. Syst. Man Cybern. Syst. 48(9), 1421–1428 (2018)
Yuan, Z., Yuan, Q., Wu, J.: Phishing detection on Ethereum via learning representation of transaction subgraphs. In: Zheng, Z., Dai, H.-N., Fu, X., Chen, B. (eds.) BlockSys 2020. CCIS, vol. 1267, pp. 178–191. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-9213-3_14
Zhuang, Y., Liu, Z., Qian, P., Liu, Q., Wang, X., He, Q.: Smart contract vulnerability detection using graph neural networks. In: Proceedings of the 2020 29th International Joint Conference on Artificial Intelligence, pp. 3283–3290 (2020)
Acknowledgments
This work was partially supported by the National Key R&D Program of China under Grant No. 2020YFB1006104, by the National Natural Science Foundation of China under Grant No. 61973273, and by the Zhejiang Provincial Natural Science Foundation of China under Grant No. LR19F030001.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
7 Appendix
7 Appendix
-
Number of Graph Nodes (N): The number of nodes in the graph.
-
Number of Graph Edges (E): The number of edges in the graph.
-
Average Degree (\(D_A\)): The mean number of edges connected to a node in the graph.
-
Percentage of leaf nodes (P): A node is defined as a leaf node if it’s degree is 1. If there are l leaf nodes in the graph, the percentage of leaf nodes can be calculated as \(P=l/N\).
-
Average Clustering Coefficient (\(C_{coef}\)): The clustering coefficient is a classic measure to quantify the edge density of the ego-network. Given a graph, there are \(m_i\) neighbors of node \(v_i\) and they are connected by \(e_i\) edges. Then, the average clustering coefficient of the graph can be defined as
$$\begin{aligned} C_{coef}=\frac{1}{N}\sum _{i=1}^N\frac{2e_i}{m_i(m_i-1)}\,. \end{aligned}$$(2) -
Largest Eigenvalue of the Adjacency Matrix (\(\lambda \)): Given a graph G, it can be represented as an adjacency matrix \(A^{N\times {N}}\). As the isomorphic invariant, we can adopt the largest one \(\lambda \) of eigenvalues of A as the graph feature.
-
Network Density (\(D_N\)): Given a network, the numbers of nodes and edges are N and E, then the network density can be defined as \(D=2E/N(N-1)\)
-
Average Betweenness Centrality (\(C_{betw}\)): For each pair of nodes in a connected network, there exists at least one shortest path between the nodes such that the number of edges that construct this path is minimized. The betweenness centrality of a node is a measure of centrality based on the shortest paths. So, the betweenness centrality of a node is defined as
$$\begin{aligned} C_{betw}(i)=\sum _{m\ne i\ne n} {\frac{e_{mn}(i)}{e_{mn}}}\,. \end{aligned}$$(3)\(C_{betw}(i)\) can reflect the importance of node i as a bridge node. Where \(e_{mn}\) is the number of shortest paths between \(v_m\) and \(v_n\), and \(e_{mn}(i)\) is the number of shortest paths between \(v_m\) and \(v_n\) that pass through \(v_i\).
Then, the average betweenness centrality of the network can be calculated as
$$\begin{aligned} C_{betw}=\frac{1}{N}\sum _{i=1}^N{C_{betw}(i)}\,. \end{aligned}$$(4) -
Average Closeness Centrality (\(C_{close}\)): The closeness centrality is also a measure of centrality based on the shortest paths, which requires taking into account the shortest paths from each node to the other nodes. Given a connected network, the closeness centrality of a node is represented as the reciprocal of the sum of shortest path length between this node and the others. The average closeness centrality of the network is defined as
$$\begin{aligned} C_{close}=\frac{1}{N}\sum _{i=1}^N\frac{{k-1}}{{\sum _{j=1}^k {{e_{ij}}} }}\,, \end{aligned}$$(5)where \(e_{ij}\) is the shortest path length between nodes \(v_i\) and \(v_j\).
-
Average Neighbor Degree (\(D_{neighbor}\)): The neighbor degree of a node is the average degree of all the neighbors of this node, which can capture the 2-hop information. We can calculate the neighbor degree of the node \(v_i\) as
$$\begin{aligned} D_{neighbor}(i)= \frac{1}{k_i}{\sum _{v_j\in \mathcal {N}_i} k_j}\,, \end{aligned}$$(6)where \(\mathcal {N}_i\) is the neighbor set of node \(v_i\), and \(k_i\) and \(k_j\) are the degrees of node \(v_i\) and \(v_j\in {\varOmega _i}\). For a network, we can get the average neighbor degree
$$\begin{aligned} D_{neighbor}=\frac{1}{N}\sum _{i=1}^N {D_{neighbor}(i)} \end{aligned}$$(7)
Rights and permissions
Copyright information
© 2021 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wang, J., Chen, P., Yu, S., Xuan, Q. (2021). TSGN: Transaction Subgraph Networks for Identifying Ethereum Phishing Accounts. In: Dai, HN., Liu, X., Luo, D.X., Xiao, J., Chen, X. (eds) Blockchain and Trustworthy Systems. BlockSys 2021. Communications in Computer and Information Science, vol 1490. Springer, Singapore. https://doi.org/10.1007/978-981-16-7993-3_15
Download citation
DOI: https://doi.org/10.1007/978-981-16-7993-3_15
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-7992-6
Online ISBN: 978-981-16-7993-3
eBook Packages: Computer ScienceComputer Science (R0)