skip to main content
10.1145/3570991.3570992acmotherconferencesArticle/Chapter ViewAbstractPublication PagescodsConference Proceedingsconference-collections
research-article

Customer Informatics by Embedding SMS Headers

Published:04 January 2023Publication History

ABSTRACT

Short Message Service (SMS) headers are alpha-numeric codes which identify the message sender without divulging the contents of the message. The code by itself may be uninformative of the intent of the message or the interest of the recipient. We show the application of embedding techniques in learning representations of the SMS headers for commercial communication. We use these embeddings to 1) discover insightful header cohorts and, 2) create customer embeddings which are then applied as features in supervised modeling tasks such as lookalike modelling and gender prediction. The experimental results show the customer embeddings help in improving performance of these models and also emerge as top features. This derived intelligence improves customer experience, product offerings and advertisement yield. To the best of our knowledge, this is the first application of representation learning for SMS headers.

References

  1. Tiago A Almeida, José María G Hidalgo, and Akebo Yamakami. 2011. Contributions to the study of SMS spam filtering: new collection and results. Proceedings of the 11th ACM symposium on Document engineering (2011), 259–262. https://dl.acm.org/doi/10.1145/2034691.2034742Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Serkan Ballı and Onur Karasoy. 2019. Development of content-based SMS classification application by using Word2Vec-based feature extraction. IET Software 13, 4 (2019), 295–304. https://doi.org/10.1049/iet-sen. 2018.5046 arXiv:https://ietresearch.onlinelibrary.wiley.com/doi/pdf/10.1049/iet- sen.2018.5046Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Alessandro Crivellari and Euro Beinat. 2019. From motion activity to geo-embeddings: Generating and exploring vector representations of locations, traces and visitors through large-scale mobility data. ISPRS International Journal of Geo-Information 8, 3(2019), 134. https://doi.org/10.3390/ijgi8030134Google ScholarGoogle ScholarCross RefCross Ref
  4. Cedric De Boom, Steven Van Canneyt, Thomas Demeester, and Bart Dhoedt. 2016. Representation learning for very short texts using weighted word embedding aggregation. Pattern Recognition Letters 80, 3 (2016), 150–156. https://www.sciencedirect.com/science/article/abs/pii/S0167865516301362?via%3DihubGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  5. Sarah Jane Delany, Mark Buckley, and Derek Greene. 2012. SMS spam filtering: Methods and data. Expert Systems with Applications 39, 10 (2012), 9899–9908. https://doi.org/10.1016/j.eswa.2012.02.053Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Tom Kenter, Alexey Borisov, and Maarten De Rijke. 2016. Siamese cbow: Optimizing word embeddings for sentence representations, In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). arXiv preprint arXiv:1606.04640. https://doi.org/10.18653/v1/P16- 1089Google ScholarGoogle ScholarCross RefCross Ref
  7. Hyun-Young Lee and Seung-Shik Kang. 2019. Word Embedding Method of SMS Messages for Spam Message Filtering, In 2019 IEEE International Conference on Big Data and Smart Computing (BigComp). arXiv preprint arXiv:1606.04640, 1–4. 10.1109/BIGCOMP.2019.8679476Google ScholarGoogle ScholarCross RefCross Ref
  8. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013a. Efficient Estimation of Word Representations in Vector Space. BMC genomics (2013). https://doi.org/10.48550/arxiv.1301.3781 [cs.CL]Google ScholarGoogle Scholar
  9. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013b. Efficient Estimation of Word Representations in Vector Space. BMC genomics (2013). https://doi.org/10.48550/arxiv.1301.3781Google ScholarGoogle Scholar
  10. Telecom Regulatory Authority of India.2019. Yearly Performance Indicators, Indian Telecom Sector (Fourth Edition).Technical Report. New Delhi, India.(2019).Google ScholarGoogle Scholar
  11. Telecom Regulatory Authority of India.2021. List of commercial communication headers Principal Entities (PE).Technical Report. New Delhi, India.(2021).Google ScholarGoogle Scholar
  12. Matteo Pagliardini, Prakhar Gupta, and Martin Jaggi. 2018. Unsupervised Learning of Sentence Embeddings Using Compositional n-Gram Features. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, New Orleans, Louisiana, 528–540. https://aclanthology.org/N18-1049Google ScholarGoogle ScholarCross RefCross Ref
  13. Radim Řehůřek and Petr Sojka. 2010. Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. ELRA, Valletta, Malta, 45–50. http://is.muni.cz/publication/884893/en.Google ScholarGoogle Scholar
  14. Gunikhan Sonowal and K S Kuppusamy. 2018. SmiDCA: An Anti-Smishing Model with Machine Learning Approach. Comput. J. 61, 8 (04 2018), 1143–1157. https://doi.org/10.1093/comjnl/bxy039Google ScholarGoogle Scholar
  15. Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing Data using t-SNE. Journal of Machine Learning Research 9, 86 (2008), 2579–2605. http://jmlr.org/papers/v9/vandermaaten08a.htmlGoogle ScholarGoogle Scholar
  16. Qian Xu, Evan Wei Xiang, Qiang Yang, Jiachun Du, and Jieping Zhong. 2012. SMS Spam Detection Using Noncontent Features. IEEE Intelligent Systems 27, 6 (2012), 44–51. https://ieeexplore.ieee.org/document/6133257Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Radim Řehůřek and Petr Sojka. 2010. Software Framework for Topic Modelling with Large Corpora. In Proceedings of LREC 2010 workshop New Challenges for NLP Frameworks. University of Malta, Valletta, Malta, 46–50. https://is.muni.cz/publication/884893/enGoogle ScholarGoogle Scholar

Index Terms

  1. Customer Informatics by Embedding SMS Headers

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      CODS-COMAD '23: Proceedings of the 6th Joint International Conference on Data Science & Management of Data (10th ACM IKDD CODS and 28th COMAD)
      January 2023
      357 pages
      ISBN:9781450397971
      DOI:10.1145/3570991

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 4 January 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate197of680submissions,29%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format