TSGN: Transaction Subgraph Networks for Identifying Ethereum Phishing Accounts

Wang, Jinhuan; Chen, Pengtao; Yu, Shanqing; Xuan, Qi

doi:10.1007/978-981-16-7993-3_15

Jinhuan Wang^10,11,
Pengtao Chen^10,11,
Shanqing Yu^10,11 &
…
Qi Xuan^10,11,12

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1490))

Included in the following conference series:

International Conference on Blockchain and Trustworthy Systems

1943 Accesses
19 Citations

Abstract

Blockchain technology and, in particular, blockchain-based transaction offers us information that has never been seen before in the financial world. In contrast to fiat currencies, transactions through virtual currencies like Bitcoin are completely public. And these transactions of cryptocurrencies are permanently recorded on Blockchain and are available at any time. Therefore, this allows us to build transaction networks (TN) to analyze illegal phenomenons such as phishing scams in blockchain from a network perspective. In this paper, we propose a Transaction SubGraph Network (TSGN) based classification model to identify phishing accounts in Ethereum. Firstly we extract transaction subgraphs for each address and then expand these subgraphs into corresponding TSGNs based on the different mapping mechanisms. We find that TSGNs can provide more potential information to benefit the identification of phishing accounts. Moreover, Directed-TSGNs, by introducing direction attributes, can retain the transaction flow information that captures the significant topological pattern of phishing scams. By comparing with the TSGN, Directed-TSGN indeed has much lower time complexity, benefiting the graph representation learning. Experimental results demonstrate that, combined with network representation algorithms, the TSGN model can capture more features to enhance the classification algorithm and improve phishing nodes’ identification accuracy in the Ethereum networks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Adebowale, M.A., Lwin, K.T., Sanchez, E., Hossain, M.A.: Intelligent web-phishing detection and protection scheme using integrated features of images, frames and text. Expert Syst. Appl. 115, 300–313 (2019)
Article Google Scholar
Alarab, I., Prakoonwit, S., Nacer, M.I.: Competence of graph convolutional networks for anti-money laundering in bitcoin blockchain. In: Proceedings of the 2020 5th International Conference on Machine Learning Technologies, pp. 23–27 (2020)
Google Scholar
Borgwardt, K.M., Ong, C.S., Schönauer, S., Vishwanathan, S., Smola, A.J., Kriegel, H.P.: Protein function prediction via graph kernels. Bioinformatics 21, i47–i56 (2005)
Article Google Scholar
Chen, L., Peng, J., Liu, Y., Li, J., Xie, F., Zheng, Z.: Phishing scams detection in Ethereum transaction network. ACM Trans. Internet Technol. (TOIT) 21(1), 1–16 (2020)
Article Google Scholar
Feng, F., Zhou, Q., Shen, Z., Yang, X., Han, L., Wang, J.Q.: The application of a novel neural network in the detection of phishing websites. J. Ambient Intell. Humanized Comput. 1–15 (2018). https://doi.org/10.1007/s12652-018-0786-3
Fu, C., et al.: Link weight prediction using supervised learning methods and its application to yelp layered network. IEEE Trans. Knowl. Data Eng. 30(8), 1507–1518 (2018)
Article Google Scholar
Gualberto, E.S., De Sousa, R.T., Vieira, T.P.D.B., Da Costa, J.P.C.L., Duque, C.G.: The answer is in the text: multi-stage methods for phishing detection based on feature engineering. IEEE Access 8, 223529–223547 (2020)
Article Google Scholar
Hosseini, M.R., Maghrebi, M., Akbarnezhad, A., Martek, I., Arashpour, M.: Analysis of citation networks in building information modeling research. J. Constr. Eng. Manage. 144(8), 04018064 (2018)
Article Google Scholar
Khonji, M., Iraqi, Y., Jones, A.: Phishing detection: a literature survey. IEEE Commun. Surv. Tutorials 15(4), 2091–2121 (2013)
Article Google Scholar
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196 (2014)
Google Scholar
Li, G., Semerci, M., Yener, B., Zaki, M.J.: Graph classification via topological and label attributes. In: Proceedings of the 9th International Workshop on Mining and Learning with Graphs (MLG), vol. 2, San Diego, USA (2011)
Google Scholar
Liu, X., Tang, Z., Li, P., Guo, S., Fan, X., Zhang, J.: A graph learning based approach for identity inference in dapp platform blockchain. IEEE Trans. Emerg. Top. Comput. (2020)
Google Scholar
Nakamoto, S.: Bitcoin: A peer-to-peer electronic cash system. Technical Report, Manubot (2019)
Google Scholar
Narayanan, A., Chandramohan, M., Chen, L., Liu, Y., Saminathan, S.: subgraph2vec: Learning distributed representations of rooted sub-graphs from large graphs. In: International Workshop on Mining and Learning with Graphs (2016)
Google Scholar
Ruan, Z., Song, C., Yang, X.H., Shen, G., Liu, Z.: Empirical analysis of urban road traffic network: a case study in Hangzhou city, china. Phys. Stat. Mech. Appl. 527, 121287 (2019)
Article Google Scholar
Sahingoz, O.K., Buber, E., Demir, O., Diri, B.: Machine learning based phishing detection from URLs. Expert Syst. Appl. 117, 345–357 (2019)
Article Google Scholar
Wang, J., et al.: Sampling subgraph network with application to graph classification. arXiv preprint arXiv:2102.05272 (2021)
Wu, J., et al.: Who are the phishers? phishing scam detection on Ethereum via network embedding. IEEE Trans. Syst. Man Cybern. Syst. (2020)
Google Scholar
Xuan, Q., et al.: Subgraph networks with application to structural feature space expansion. IEEE Trans. Knowl. Data Eng. (2019). https://doi.org/10.1109/TKDE.2019.2957755
Article Google Scholar
Ying, R., You, J., Morris, C., Ren, X., Hamilton, W.L., Leskovec, J.: Hierarchical graph representation learning with differentiable pooling. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 4805–4815 (2018)
Google Scholar
Yuan, Y., Wang, F.Y.: Blockchain and cryptocurrencies: model, techniques, and applications. IEEE Trans. Syst. Man Cybern. Syst. 48(9), 1421–1428 (2018)
Article Google Scholar
Yuan, Z., Yuan, Q., Wu, J.: Phishing detection on Ethereum via learning representation of transaction subgraphs. In: Zheng, Z., Dai, H.-N., Fu, X., Chen, B. (eds.) BlockSys 2020. CCIS, vol. 1267, pp. 178–191. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-9213-3_14
Chapter Google Scholar
Zhuang, Y., Liu, Z., Qian, P., Liu, Q., Wang, X., He, Q.: Smart contract vulnerability detection using graph neural networks. In: Proceedings of the 2020 29th International Joint Conference on Artificial Intelligence, pp. 3283–3290 (2020)
Google Scholar

Download references

Acknowledgments

This work was partially supported by the National Key R&D Program of China under Grant No. 2020YFB1006104, by the National Natural Science Foundation of China under Grant No. 61973273, and by the Zhejiang Provincial Natural Science Foundation of China under Grant No. LR19F030001.

Author information

Authors and Affiliations

Institute of Cyberspace Security, Zhejiang University of Technology, Hangzhou, 310023, China
Jinhuan Wang, Pengtao Chen, Shanqing Yu & Qi Xuan
College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China
Jinhuan Wang, Pengtao Chen, Shanqing Yu & Qi Xuan
PCL Research Center of Networks and Communications, Peng Cheng Laboratory, Shenzhen, 518000, China
Qi Xuan

Authors

Jinhuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Pengtao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Shanqing Yu
View author publications
You can also search for this author in PubMed Google Scholar
Qi Xuan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qi Xuan .

Editor information

Editors and Affiliations

Macau University of Science and Technology, Macao, China
Hong-Ning Dai
Peking University, Beijing, China
Xuanzhe Liu
Hong Kong Polytechnic University, Hong Kong, China
Daniel Xiapu Luo
Huazhong University of Science and Technology, Wuhan, Hubei, China
Jiang Xiao
Sun Yat-sen University, Guangzhou, China
Xiangping Chen

7 Appendix

Number of Graph Nodes (N): The number of nodes in the graph.
Number of Graph Edges (E): The number of edges in the graph.
Average Degree ($D_A$): The mean number of edges connected to a node in the graph.
Percentage of leaf nodes (P): A node is defined as a leaf node if it’s degree is 1. If there are l leaf nodes in the graph, the percentage of leaf nodes can be calculated as $P=l/N$.
Average Clustering Coefficient ($C_{coef}$): The clustering coefficient is a classic measure to quantify the edge density of the ego-network. Given a graph, there are $m_i$ neighbors of node $v_i$ and they are connected by $e_i$ edges. Then, the average clustering coefficient of the graph can be defined as
$$\begin{aligned} C_{coef}=\frac{1}{N}\sum _{i=1}^N\frac{2e_i}{m_i(m_i-1)}\,. \end{aligned}$$
(2)
Largest Eigenvalue of the Adjacency Matrix ($\lambda $): Given a graph G, it can be represented as an adjacency matrix $A^{N\times {N}}$. As the isomorphic invariant, we can adopt the largest one $\lambda $ of eigenvalues of A as the graph feature.
Network Density ($D_N$): Given a network, the numbers of nodes and edges are N and E, then the network density can be defined as $D=2E/N(N-1)$
Average Betweenness Centrality ($C_{betw}$): For each pair of nodes in a connected network, there exists at least one shortest path between the nodes such that the number of edges that construct this path is minimized. The betweenness centrality of a node is a measure of centrality based on the shortest paths. So, the betweenness centrality of a node is defined as
$$\begin{aligned} C_{betw}(i)=\sum _{m\ne i\ne n} {\frac{e_{mn}(i)}{e_{mn}}}\,. \end{aligned}$$
(3)
$C_{betw}(i)$ can reflect the importance of node i as a bridge node. Where $e_{mn}$ is the number of shortest paths between $v_m$ and $v_n$, and $e_{mn}(i)$ is the number of shortest paths between $v_m$ and $v_n$ that pass through $v_i$.

Then, the average betweenness centrality of the network can be calculated as
$$\begin{aligned} C_{betw}=\frac{1}{N}\sum _{i=1}^N{C_{betw}(i)}\,. \end{aligned}$$
(4)
Average Closeness Centrality ($C_{close}$): The closeness centrality is also a measure of centrality based on the shortest paths, which requires taking into account the shortest paths from each node to the other nodes. Given a connected network, the closeness centrality of a node is represented as the reciprocal of the sum of shortest path length between this node and the others. The average closeness centrality of the network is defined as
$$\begin{aligned} C_{close}=\frac{1}{N}\sum _{i=1}^N\frac{{k-1}}{{\sum _{j=1}^k {{e_{ij}}} }}\,, \end{aligned}$$
(5)
where $e_{ij}$ is the shortest path length between nodes $v_i$ and $v_j$.
Average Neighbor Degree ($D_{neighbor}$): The neighbor degree of a node is the average degree of all the neighbors of this node, which can capture the 2-hop information. We can calculate the neighbor degree of the node $v_i$ as
$$\begin{aligned} D_{neighbor}(i)= \frac{1}{k_i}{\sum _{v_j\in \mathcal {N}_i} k_j}\,, \end{aligned}$$
(6)
where $\mathcal {N}_i$ is the neighbor set of node $v_i$, and $k_i$ and $k_j$ are the degrees of node $v_i$ and $v_j\in {\varOmega _i}$. For a network, we can get the average neighbor degree
$$\begin{aligned} D_{neighbor}=\frac{1}{N}\sum _{i=1}^N {D_{neighbor}(i)} \end{aligned}$$
(7)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, J., Chen, P., Yu, S., Xuan, Q. (2021). TSGN: Transaction Subgraph Networks for Identifying Ethereum Phishing Accounts. In: Dai, HN., Liu, X., Luo, D.X., Xiao, J., Chen, X. (eds) Blockchain and Trustworthy Systems. BlockSys 2021. Communications in Computer and Information Science, vol 1490. Springer, Singapore. https://doi.org/10.1007/978-981-16-7993-3_15

Download citation

DOI: https://doi.org/10.1007/978-981-16-7993-3_15
Published: 01 January 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-7992-6
Online ISBN: 978-981-16-7993-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

TSGN: Transaction Subgraph Networks for Identifying Ethereum Phishing Accounts

Abstract

Access this chapter

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

7 Appendix

7 Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation