Skip to main content

Profile Fusion in Social Networks: A Data-Driven Approach

  • Chapter
  • First Online:
Social Media Analysis for Event Detection

Part of the book series: Lecture Notes in Social Networks ((LNSN))

Abstract

User matching across various social networks has received a significant attention in the recent years. Several approaches have been evaluated including discrete user attributes, text mining, network analysis, and more recently machine learning. However, there is a lack of publicly available labeled datasets for this task. Our contribution is twofold. Firstly, we create an open-source framework that collects profiles from various social networks and identifies the true pairs of accounts corresponding to the same user by leveraging user attributes and computer vision. We present a case study dataset that encompasses more than 27k anonymized profile pairs from Quora and Twitter with their corresponding content: 33M tweets and 1.1M Quora answers. Secondly, we evaluate different user linkage schemes and text representation models for the identification of users across these two social networks and discuss the limitations of each approach. Our experiments show that users can be identified with up to 84% accuracy when they have a sufficient amount of generated content in their social accounts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://datareportal.com/reports/digital-2019-global-digital-overview.

  2. 2.

    https://www.pewinternet.org/2018/03/01/social-media-use-in-2018/.

  3. 3.

    Zenodo: https://zenodo.org/record/3837711#.Xvr1uJZRU-I.

  4. 4.

    https://expandedramblings.com/index.php/quora-statistics/.

  5. 5.

    https://foundationinc.co/lab/quora-statistics/.

  6. 6.

    Github: https://github.com/banyous/quora-twitter-scrapping.

  7. 7.

    Github: https://github.com/twintproject/twint.

  8. 8.

    Selenium: https://selenium-python.readthedocs.io/.

  9. 9.

    Zenodo: https://zenodo.org/record/3837711#.Xvr1uJZRU-I.

  10. 10.

    Github: https://github.com/banyous/Quora-and-Twitter-crawler-and-user-matcher.

  11. 11.

    Github: https://github.com/saffsd/langid.py. The idea here is that we analyze the users language by extracting the dominant language of each user profile. We then count the frequency of each detected language across the dataset. We employ the same approach by counting frequencies of locations detected in users bio. Table 3 summarizes the top detected languages and locations from the two social networks. Non surprisingly, English is the dominant language on both platforms with more than 99% of Quora posts and 91% of Twitter posts being in this language. Twitter is naturally more language diversified than Quora due to the social character of the former one. An other interesting fact is to see “Hindi” as the third most used language in Quora. This can be explained by the prominence of southern region Asian locations in the Quora top detected locations, such as Pakistan, Mumbai, and Chennai. This has a relation with our starting point: Quora, which is quite prominent in India (Quora stats: https://www.alexa.com/siteinfo/quora.com). It is not surprising to see Indian locations frequent in Twitter data as well.

References

  1. Goga O, Loiseau P, Sommer R, Teixeira R, Gummadi KP (2015) On the reliability of profile matching across large online social networks. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, pp 1799–1808

    Chapter  Google Scholar 

  2. Liu S, Wang S, Zhu F, Zhang J, Krishnan R (2014) Hydra: large-scale social identity linkage via heterogeneous behavior modeling. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data. ACM, New York, pp 51–62

    Chapter  Google Scholar 

  3. Tan S, Guan Z, Cai D, Qin X, Bu J, Chen C (2014) Mapping users across networks by manifold alignment on hypergraph. In Twenty-Eighth AAAI conference on artificial intelligence

    Google Scholar 

  4. Motoyama, M, Varghese G (2009) I seek you: searching and matching individuals in social networks. In Proceedings of the eleventh international workshop on Web information and data management. ACM, New York, pp 67–75

    Google Scholar 

  5. Zhang H, Kan M-Y, Liu Y, Ma S (2014) Online social network profile linkage. In Asia information retrieval symposium. Springer, Berlin, pp 197–208

    Google Scholar 

  6. Li Y, Zhang Z, Peng Y, Yin H, Xu Q (2018) Matching user accounts based on user generated content across social networks. Futur Gener Comput Syst 83:104–115

    Article  Google Scholar 

  7. Goga O, Lei H, Parthasarathi SHK, Friedland G, Sommer R, Teixeira R (2013) Exploiting innocuous activity for correlating users across sites. In Proceedings of the 22nd international conference on World Wide Web. ACM, New York, pp 447–458

    Chapter  Google Scholar 

  8. Vosoughi S, Zhou H, Roy D (2015) Digital stylometry: linking profiles across social networks. In International conference on social informatics. Springer, Berlin, pp 164–177

    Google Scholar 

  9. Liu L, Cheung WK, Li X, Liao L (2016) Aligning users across social networks using network embedding. In IJCAI, pp 1774–1780

    Google Scholar 

  10. Narayanan A, Shmatikov V (2009) De-anonymizing social networks. arXiv preprint arXiv:0903.3276

    Google Scholar 

  11. Zafarani R, Liu H (2013) Connecting users across social media sites: a behavioral-modeling approach. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, New York, pp 41–49

    Chapter  Google Scholar 

  12. Perito D, Castelluccia C, Kaafar MA, Manils P (2011) How unique and traceable are usernames? In International symposium on privacy enhancing technologies symposium. Springer, Berlin, pp 1–17

    Google Scholar 

  13. Wang M, Tan Q, Wang X, Shi J (2018) De-anonymizing social networks user via profile similarity. In Proceedings of the 2018 IEEE third international conference on data science in cyberspace (DSC). IEEE, New York, pp 889–895

    Chapter  Google Scholar 

  14. Bennacer N, Jipmo CN, Penta A, Quercini G (2014) Matching user profiles across social networks. In International Conference on Advanced Information Systems Engineering. Springer, New York, pp 424–438

    Google Scholar 

  15. Kong X, Zhang J, Yu PS (2013) Inferring anchor links across multiple heterogeneous social networks. In Proceedings of the 22nd ACM international conference on Information and Knowledge Management. ACM, New York, pp 179–188

    Google Scholar 

  16. Abel F, Herder E, Houben G-J, Henze N, Krause D (2013) Cross-system user modeling and personalization on the social web. User Model User-Adap Inter 23(2–3):169–209

    Article  Google Scholar 

  17. Chen W, Yin H, Wang W, Zhao L, Zhou X (2018) Effective and efficient user account linkage across location based social networks. In Proceedings of the 2018 IEEE 34th International Conference on Data Engineering (ICDE). IEEE, New York, pp 1085–1096

    Chapter  Google Scholar 

  18. Sun S, Li Q, Yan P, Zeng DD (2017) Mapping users across social media platforms by integrating text and structure information. In Proceedings of the 2017 IEEE International Conference on Intelligence and Security Informatics (ISI). IEEE, New York, pp 113–118

    Google Scholar 

  19. Riederer C, Kim Y, Chaintreau A, Korula N, Lattanzi S (2016) Linking users across domains with location data: theory and validation. In Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, pp 707–719

    Google Scholar 

  20. Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805

    Google Scholar 

  21. Sun C, Qiu X, Xu Y, Huang X (2019) How to fine-tune Bert for text classification? In China National Conference on Chinese Computational Linguistics. Springer, Berlin, pp 194–206

    Google Scholar 

  22. You C, Robinson D, Vidal R (2016) Scalable sparse subspace clustering by orthogonal matching pursuit. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3918–3927

    Google Scholar 

  23. Xu Y, Zhou D, Lawless S (2016) Inferring your expertise from twitter: Integrating sentiment and topic relatedness. In 2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI). IEEE, New York, pp 121–128

    Chapter  Google Scholar 

  24. Zheng R, Li J, Chen H, Huang Z (2006) A framework for authorship identification of online messages: writing-style features and classification techniques. J Am Soc Inf Sci Technol 57(3):378–393

    Article  Google Scholar 

  25. Li H, Chen Q, Zhu H, Ma D, Wen H, Shen XS (2017) Privacy leakage via de-anonymization and aggregation in heterogeneous social networks. IEEE Trans Dependable Secure Comput 17(2):350–362

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Youcef Benkhedda .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Benkhedda, Y., Azouaou, F., Abbar, S. (2022). Profile Fusion in Social Networks: A Data-Driven Approach. In: Özyer, T. (eds) Social Media Analysis for Event Detection. Lecture Notes in Social Networks. Springer, Cham. https://doi.org/10.1007/978-3-031-08242-9_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-08242-9_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-08241-2

  • Online ISBN: 978-3-031-08242-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics