Skip to main content

TSGN: Transaction Subgraph Networks for Identifying Ethereum Phishing Accounts

  • Conference paper
  • First Online:
Blockchain and Trustworthy Systems (BlockSys 2021)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1490))

Included in the following conference series:

Abstract

Blockchain technology and, in particular, blockchain-based transaction offers us information that has never been seen before in the financial world. In contrast to fiat currencies, transactions through virtual currencies like Bitcoin are completely public. And these transactions of cryptocurrencies are permanently recorded on Blockchain and are available at any time. Therefore, this allows us to build transaction networks (TN) to analyze illegal phenomenons such as phishing scams in blockchain from a network perspective. In this paper, we propose a Transaction SubGraph Network (TSGN) based classification model to identify phishing accounts in Ethereum. Firstly we extract transaction subgraphs for each address and then expand these subgraphs into corresponding TSGNs based on the different mapping mechanisms. We find that TSGNs can provide more potential information to benefit the identification of phishing accounts. Moreover, Directed-TSGNs, by introducing direction attributes, can retain the transaction flow information that captures the significant topological pattern of phishing scams. By comparing with the TSGN, Directed-TSGN indeed has much lower time complexity, benefiting the graph representation learning. Experimental results demonstrate that, combined with network representation algorithms, the TSGN model can capture more features to enhance the classification algorithm and improve phishing nodes’ identification accuracy in the Ethereum networks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Adebowale, M.A., Lwin, K.T., Sanchez, E., Hossain, M.A.: Intelligent web-phishing detection and protection scheme using integrated features of images, frames and text. Expert Syst. Appl. 115, 300–313 (2019)

    Article  Google Scholar 

  2. Alarab, I., Prakoonwit, S., Nacer, M.I.: Competence of graph convolutional networks for anti-money laundering in bitcoin blockchain. In: Proceedings of the 2020 5th International Conference on Machine Learning Technologies, pp. 23–27 (2020)

    Google Scholar 

  3. Borgwardt, K.M., Ong, C.S., Schönauer, S., Vishwanathan, S., Smola, A.J., Kriegel, H.P.: Protein function prediction via graph kernels. Bioinformatics 21, i47–i56 (2005)

    Article  Google Scholar 

  4. Chen, L., Peng, J., Liu, Y., Li, J., Xie, F., Zheng, Z.: Phishing scams detection in Ethereum transaction network. ACM Trans. Internet Technol. (TOIT) 21(1), 1–16 (2020)

    Article  Google Scholar 

  5. Feng, F., Zhou, Q., Shen, Z., Yang, X., Han, L., Wang, J.Q.: The application of a novel neural network in the detection of phishing websites. J. Ambient Intell. Humanized Comput. 1–15 (2018). https://doi.org/10.1007/s12652-018-0786-3

  6. Fu, C., et al.: Link weight prediction using supervised learning methods and its application to yelp layered network. IEEE Trans. Knowl. Data Eng. 30(8), 1507–1518 (2018)

    Article  Google Scholar 

  7. Gualberto, E.S., De Sousa, R.T., Vieira, T.P.D.B., Da Costa, J.P.C.L., Duque, C.G.: The answer is in the text: multi-stage methods for phishing detection based on feature engineering. IEEE Access 8, 223529–223547 (2020)

    Article  Google Scholar 

  8. Hosseini, M.R., Maghrebi, M., Akbarnezhad, A., Martek, I., Arashpour, M.: Analysis of citation networks in building information modeling research. J. Constr. Eng. Manage. 144(8), 04018064 (2018)

    Article  Google Scholar 

  9. Khonji, M., Iraqi, Y., Jones, A.: Phishing detection: a literature survey. IEEE Commun. Surv. Tutorials 15(4), 2091–2121 (2013)

    Article  Google Scholar 

  10. Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196 (2014)

    Google Scholar 

  11. Li, G., Semerci, M., Yener, B., Zaki, M.J.: Graph classification via topological and label attributes. In: Proceedings of the 9th International Workshop on Mining and Learning with Graphs (MLG), vol. 2, San Diego, USA (2011)

    Google Scholar 

  12. Liu, X., Tang, Z., Li, P., Guo, S., Fan, X., Zhang, J.: A graph learning based approach for identity inference in dapp platform blockchain. IEEE Trans. Emerg. Top. Comput. (2020)

    Google Scholar 

  13. Nakamoto, S.: Bitcoin: A peer-to-peer electronic cash system. Technical Report, Manubot (2019)

    Google Scholar 

  14. Narayanan, A., Chandramohan, M., Chen, L., Liu, Y., Saminathan, S.: subgraph2vec: Learning distributed representations of rooted sub-graphs from large graphs. In: International Workshop on Mining and Learning with Graphs (2016)

    Google Scholar 

  15. Ruan, Z., Song, C., Yang, X.H., Shen, G., Liu, Z.: Empirical analysis of urban road traffic network: a case study in Hangzhou city, china. Phys. Stat. Mech. Appl. 527, 121287 (2019)

    Article  Google Scholar 

  16. Sahingoz, O.K., Buber, E., Demir, O., Diri, B.: Machine learning based phishing detection from URLs. Expert Syst. Appl. 117, 345–357 (2019)

    Article  Google Scholar 

  17. Wang, J., et al.: Sampling subgraph network with application to graph classification. arXiv preprint arXiv:2102.05272 (2021)

  18. Wu, J., et al.: Who are the phishers? phishing scam detection on Ethereum via network embedding. IEEE Trans. Syst. Man Cybern. Syst. (2020)

    Google Scholar 

  19. Xuan, Q., et al.: Subgraph networks with application to structural feature space expansion. IEEE Trans. Knowl. Data Eng. (2019). https://doi.org/10.1109/TKDE.2019.2957755

    Article  Google Scholar 

  20. Ying, R., You, J., Morris, C., Ren, X., Hamilton, W.L., Leskovec, J.: Hierarchical graph representation learning with differentiable pooling. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 4805–4815 (2018)

    Google Scholar 

  21. Yuan, Y., Wang, F.Y.: Blockchain and cryptocurrencies: model, techniques, and applications. IEEE Trans. Syst. Man Cybern. Syst. 48(9), 1421–1428 (2018)

    Article  Google Scholar 

  22. Yuan, Z., Yuan, Q., Wu, J.: Phishing detection on Ethereum via learning representation of transaction subgraphs. In: Zheng, Z., Dai, H.-N., Fu, X., Chen, B. (eds.) BlockSys 2020. CCIS, vol. 1267, pp. 178–191. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-9213-3_14

    Chapter  Google Scholar 

  23. Zhuang, Y., Liu, Z., Qian, P., Liu, Q., Wang, X., He, Q.: Smart contract vulnerability detection using graph neural networks. In: Proceedings of the 2020 29th International Joint Conference on Artificial Intelligence, pp. 3283–3290 (2020)

    Google Scholar 

Download references

Acknowledgments

This work was partially supported by the National Key R&D Program of China under Grant No. 2020YFB1006104, by the National Natural Science Foundation of China under Grant No. 61973273, and by the Zhejiang Provincial Natural Science Foundation of China under Grant No. LR19F030001.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qi Xuan .

Editor information

Editors and Affiliations

7 Appendix

7 Appendix

  • Number of Graph Nodes (N): The number of nodes in the graph.

  • Number of Graph Edges (E): The number of edges in the graph.

  • Average Degree (\(D_A\)): The mean number of edges connected to a node in the graph.

  • Percentage of leaf nodes (P): A node is defined as a leaf node if it’s degree is 1. If there are l leaf nodes in the graph, the percentage of leaf nodes can be calculated as \(P=l/N\).

  • Average Clustering Coefficient (\(C_{coef}\)): The clustering coefficient is a classic measure to quantify the edge density of the ego-network. Given a graph, there are \(m_i\) neighbors of node \(v_i\) and they are connected by \(e_i\) edges. Then, the average clustering coefficient of the graph can be defined as

    $$\begin{aligned} C_{coef}=\frac{1}{N}\sum _{i=1}^N\frac{2e_i}{m_i(m_i-1)}\,. \end{aligned}$$
    (2)
  • Largest Eigenvalue of the Adjacency Matrix (\(\lambda \)): Given a graph G, it can be represented as an adjacency matrix \(A^{N\times {N}}\). As the isomorphic invariant, we can adopt the largest one \(\lambda \) of eigenvalues of A as the graph feature.

  • Network Density (\(D_N\)): Given a network, the numbers of nodes and edges are N and E, then the network density can be defined as \(D=2E/N(N-1)\)

  • Average Betweenness Centrality (\(C_{betw}\)): For each pair of nodes in a connected network, there exists at least one shortest path between the nodes such that the number of edges that construct this path is minimized. The betweenness centrality of a node is a measure of centrality based on the shortest paths. So, the betweenness centrality of a node is defined as

    $$\begin{aligned} C_{betw}(i)=\sum _{m\ne i\ne n} {\frac{e_{mn}(i)}{e_{mn}}}\,. \end{aligned}$$
    (3)

    \(C_{betw}(i)\) can reflect the importance of node i as a bridge node. Where \(e_{mn}\) is the number of shortest paths between \(v_m\) and \(v_n\), and \(e_{mn}(i)\) is the number of shortest paths between \(v_m\) and \(v_n\) that pass through \(v_i\).

    Then, the average betweenness centrality of the network can be calculated as

    $$\begin{aligned} C_{betw}=\frac{1}{N}\sum _{i=1}^N{C_{betw}(i)}\,. \end{aligned}$$
    (4)
  • Average Closeness Centrality (\(C_{close}\)): The closeness centrality is also a measure of centrality based on the shortest paths, which requires taking into account the shortest paths from each node to the other nodes. Given a connected network, the closeness centrality of a node is represented as the reciprocal of the sum of shortest path length between this node and the others. The average closeness centrality of the network is defined as

    $$\begin{aligned} C_{close}=\frac{1}{N}\sum _{i=1}^N\frac{{k-1}}{{\sum _{j=1}^k {{e_{ij}}} }}\,, \end{aligned}$$
    (5)

    where \(e_{ij}\) is the shortest path length between nodes \(v_i\) and \(v_j\).

  • Average Neighbor Degree (\(D_{neighbor}\)): The neighbor degree of a node is the average degree of all the neighbors of this node, which can capture the 2-hop information. We can calculate the neighbor degree of the node \(v_i\) as

    $$\begin{aligned} D_{neighbor}(i)= \frac{1}{k_i}{\sum _{v_j\in \mathcal {N}_i} k_j}\,, \end{aligned}$$
    (6)

    where \(\mathcal {N}_i\) is the neighbor set of node \(v_i\), and \(k_i\) and \(k_j\) are the degrees of node \(v_i\) and \(v_j\in {\varOmega _i}\). For a network, we can get the average neighbor degree

    $$\begin{aligned} D_{neighbor}=\frac{1}{N}\sum _{i=1}^N {D_{neighbor}(i)} \end{aligned}$$
    (7)

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, J., Chen, P., Yu, S., Xuan, Q. (2021). TSGN: Transaction Subgraph Networks for Identifying Ethereum Phishing Accounts. In: Dai, HN., Liu, X., Luo, D.X., Xiao, J., Chen, X. (eds) Blockchain and Trustworthy Systems. BlockSys 2021. Communications in Computer and Information Science, vol 1490. Springer, Singapore. https://doi.org/10.1007/978-981-16-7993-3_15

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-7993-3_15

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-7992-6

  • Online ISBN: 978-981-16-7993-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics