Skip to main content

Detecting Anomalous Subgraphs on Attributed Graphs via Parametric Flow

  • Conference paper
  • First Online:
New Frontiers in Artificial Intelligence (JSAI-isAI 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9067))

Included in the following conference series:

  • 795 Accesses

Abstract

Detecting anomalies from structured graph data is becoming a critical task for many applications such as an analysis of disease infection in communities. To date, however, there exists no efficient method that works on massive attributed graphs with millions of vertices for detecting anomalous subgraphs with an abnormal distribution of vertex attributes. Here we report that this task is efficiently solved using the recent graph cut-based formulation. In particular, the full hierarchy of anomalous subgraphs can be simultaneously obtained via the parametric flow algorithm, which allows us to introduce the size constraint on anomalous subgraphs. We thoroughly examine the method using various sizes of synthetic and real-world datasets and show that our method is more than five orders of magnitude faster than the state-of-the-art method and is more effective in detection of anomalous subgraphs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This fact is pointed out in [26] but has not been used in any applications. A related result is theoretically analyzed in [16].

  2. 2.

    Source code is available at http://research.microsoft.com/en-us/downloads/d3adb5f7-49ea-4170-abde-ea0206b25de2/. Since the code can handle only integers for parameters, we first transform every parameter to an integer by multiplying some constant value.

  3. 3.

    http://www.cais.ntu.edu.sg/~chi/software.html.

  4. 4.

    http://www.cs.umd.edu/~sen/lbc-proj/LBC.html.

  5. 5.

    http://snap.stanford.edu/index.html.

References

  1. Aggarwal, C.C.: Outlier Analysis. Springer, New York (2013)

    Book  Google Scholar 

  2. Akoglu, L., McGlohon, M., Faloutsos, C.: oddball: spotting anomalies in weighted graphs. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.) PAKDD 2010. LNCS, vol. 6119, pp. 410–421. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  3. Akoglu, L., Tong, H., Koutra, D.: Graph based anomaly detection and description: a survey. Data Min. Knowl. Disc. 29, 1–63 (2014)

    MathSciNet  Google Scholar 

  4. Azencott, C.A., Grimm, D., Sugiyama, M., Kawahara, Y., Borgwardt, K.M.: Efficient network-guided multi-locus association mapping with graph cuts. Bioinformatics 29(13), i171–i179 (2013)

    Article  Google Scholar 

  5. Bhaduri, K., Matthews, B.L., Giannella, C.R.: Algorithms for speeding up distance-based outlier detection. In: Proceedings of the 17th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 859–867 (2011)

    Google Scholar 

  6. Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 93–104 (2000)

    Google Scholar 

  7. Chakrabarti, D.: AutoPart: parameter-free graph partitioning and outlier detection. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 112–124. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  8. Chapelle, O., Schölkopf, B., Zien, A.: A discussion of semi-supervised learning and transduction. In: Chapelle, O., Schölkopf, B., Zien, A. (eds.) Semi-Supervised Learning, Chap. 25, pp. 473–478. MIT Press, Cambridge (2006)

    Chapter  Google Scholar 

  9. Clauset, A., Newman, M.E.J., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70(6), 066111 (2004)

    Article  Google Scholar 

  10. Eberle, W., Holder, L.: Discovering structural anomalies in graph-based data. In: IEEE International Conference on Data Mining (ICDM) Workshop, pp. 393–398 (2007)

    Google Scholar 

  11. Gallo, G., Grigoriadis, M.D., Tarjan, R.E.: A fast parametric maximum flow algorithm and applications. SIAM J. Comput. 18(1), 30–55 (1989)

    Article  MathSciNet  Google Scholar 

  12. Gao, J., Liang, F., Fan, W., Wang, C., Sun, Y., Han, J.: On community outliers and their efficient detection in information networks. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 813–822 (2010)

    Google Scholar 

  13. Goldberg, A.V., Tarjan, R.E.: A new approach to the maximum-flow problem. J. ACM 35(4), 921–940 (1988)

    Article  MathSciNet  Google Scholar 

  14. Henderson, K., Eliassi-Rad, T., Faloutsos, C., Akoglu, L., Li, L., Maruhashi, K., Prakash, B.A., Tong, H.: Metric forensics: a multi-level approach for mining volatile graphs. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 163–172 (2010)

    Google Scholar 

  15. Henderson, K., Gallagher, B., Li, L., Akoglu, L., Eliassi-Rad, T., Tong, H., Faloutsos, C.: It’s who you know: graph mining using recursive structural features. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 663–671 (2011)

    Google Scholar 

  16. Kawahara, Y., Nagano, K.: Structured convex optimization under submodular constraints. In: Proceedings of Uncertainty in Artificial Intelligence (UAI), pp. 459–468 (2013)

    Google Scholar 

  17. Lee, H.F., Dooly, D.R.: Algorithms for the constrained maximum-weight connected graph problem. Naval Res. Logistics 43(7), 985–1008 (1996)

    Article  MathSciNet  Google Scholar 

  18. Li, N., Sun, H., Chipman, K., George, J., Yan, X.: A probabilistic approach to uncovering attributed graph anomalies. In: Proceedings of SIAM International Conference on Data Mining (SDM), pp. 82–90 (2014)

    Google Scholar 

  19. Lin, C.Y., Tong, H.: Non-negative residual matrix factorization with application to graph anomaly detection. In: Proceedings of SIAM International Conference on Data Mining (SDM), pp. 143–153 (2011)

    Google Scholar 

  20. Müller, E., Sanchez, P.I., Mülle, Y., Böhm, K.: Ranking outlier nodes in subspaces of attributed graphs. In: ICDE Workshop, pp. 216–222 (2013)

    Google Scholar 

  21. Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004)

    Article  Google Scholar 

  22. Noble, C.C., Cook, D.J.: Graph-based anomaly detection. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 631–636 (2003)

    Google Scholar 

  23. Papadimitriou, C.H., Steiglitz, K.: Combinatorial Optimization: Algorithms and Complexity. Dover, New York (1998)

    MATH  Google Scholar 

  24. Perozzi, B., Akoglu, L. Sánchez, P.I., Müller, E.: Focused clustering and outlier detection in large attributed graphs. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2014)

    Google Scholar 

  25. Pham, N., Pagh, R.: A near-linear time approximation algorithm for angle-based outlier detection in high-dimensional data. In: Proceedings of the 18th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 877–885 (2012)

    Google Scholar 

  26. Sugiyama, M., Azencott, C.A., Grimm, D., Kawahara, Y., Borgwardt, K.M.: Multi-task feature selection on multiple networks via maximum flows. In: Proceedings of SIAM International Conference on Data Mining (SDM), pp. 199–207 (2014)

    Google Scholar 

  27. Sugiyama, M., Borgwardt, K.M.: Rapid distance-based outlier detection via sampling. In: Advances in Neural Information Processing Systems, pp. 467–475 (2013)

    Google Scholar 

  28. Watts, D.J., Strogatz, S.H.: Collective dynamics of ‘small-world’ networks. Nature 393(6684), 440–442 (1998)

    Article  Google Scholar 

  29. Xu, Z., Ke, Y., Wang, Y., Cheng, H., Cheng, J.: GBAGC: a general Bayesian framework for attributed graph clustering. ACM Trans. Knowl. Disc. Data 9(1), 1–43 (2014)

    Article  Google Scholar 

  30. Yang, J., Leskovec, J.: Defining and evaluating network communities based on ground-truth. In: Proceedings of the 2012 IEEE International Conference on Data Mining (ICDM), pp. 745–754 (2012)

    Google Scholar 

Download references

Acknowledgment

The authors thank Yoshinobu Kawahara for insightful discussions. This work was partially supported by JSPS KAKENHI 26880013 and Grand-in-Aid for JSPS Fellows 26-4555.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mahito Sugiyama .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sugiyama, M., Otaki, K. (2015). Detecting Anomalous Subgraphs on Attributed Graphs via Parametric Flow. In: Murata, T., Mineshima, K., Bekki, D. (eds) New Frontiers in Artificial Intelligence. JSAI-isAI 2014. Lecture Notes in Computer Science(), vol 9067. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48119-6_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-48119-6_26

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-48118-9

  • Online ISBN: 978-3-662-48119-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics