Abstract
Online chatrooms serve as vital platforms for information exchange among software developers. With multiple developers engaged in rapid communication and diverse conversation topics, the resulting chat messages often manifest complexity and lack structure. To enhance the efficiency of extracting information from chat threads, automatic mining techniques are introduced for thread classification. However, previous approaches still grapple with unsatisfactory classification accuracy, due to two primary challenges that they struggle to adequately capture long-distance dependencies within chat threads and address the issue of category imbalance in labeled datasets. To surmount these challenges, we present a topic classification approach for chat information types named EAEChat. Specifically, EAEChat comprises three core components: the text feature encoding component captures contextual text features using a multi-head self-attention mechanism-based text feature encoder, and a siamese network is employed to mitigate overfitting caused by limited data; the data augmentation component expands a small number of categories in the training dataset using a technique tailored to developer chat messages, effectively tackling the challenge of imbalanced category distribution; the non-text feature encoding component employs a feature fusion model to integrate deep text features with manually extracted non-text features. Evaluation across three real-world projects demonstrates that EAEChat respectively achieves an average precision, recall, and F1-score of 0.653, 0.651, and 0.644, and it marks a significant 7.60% improvement over the state-of-the-art approachs. These findings confirm the effectiveness of our method in proficiently classifying developer chat messages in online chatrooms.
- Bin Lin, Alexey Zagalsky, Margaret-Anne Storey, and Alexander Serebrenik. 2016. Why developers are slacking off: Understanding how software teams use slack. In Proceedings of the 19th acm conference on computer supported cooperative work and social computing companion. 333–336.Google ScholarDigital Library
- Margaret-Anne Storey, Leif Singer, Brendan Cleary, Fernando Figueira Filho, and Alexey Zagalsky. 2014. The (r) evolution of social media in software engineering. Future of software engineering proceedings(2014), 100–116.Google Scholar
- Verena Käfer, Daniel Graziotin, Ivan Bogicevic, Stefan Wagner, and Jasmin Ramadani. 2018. Communication in open-source projects-end of the e-mail era?. In Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings. 242–243.Google ScholarDigital Library
- 2023. Gitter. https://gitter.im/. (2023).Google Scholar
- 2023. Slack. https://slack.com/. (2023).Google Scholar
- 2023. Freenode. https://freenode.net/. (2023).Google Scholar
- Preetha Chatterjee, Kostadin Damevski, Lori Pollock, Vinay Augustine, and Nicholas A Kraft. 2019. Exploratory study of slack q&a chats as a mining source for software engineering tools. In 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). IEEE, 490–501.Google ScholarDigital Library
- Osama Ehsan, Safwat Hassan, Mariam El Mezouar, and Ying Zou. 2020. An empirical study of developer discussions in the gitter platform. ACM Transactions on Software Engineering and Methodology (TOSEM) 30, 1(2020), 1–39.Google ScholarDigital Library
- Lin Shi, Mingzhe Xing, Mingyang Li, Yawen Wang, Shoubin Li, and Qing Wang. 2020. Detection of hidden feature requests from massive chat messages via deep siamese network. (2020), 641–653.Google Scholar
- Hareem Sahar, Abram Hindle, and Cor-Paul Bezemer. 2021. How are issue reports discussed in Gitter chat rooms?Journal of Systems and Software 172 (2021), 110852.Google Scholar
- Rana Alkadhi, Manuel Nonnenmacher, Emitza Guzman, and Bernd Bruegge. 2018. How do developers discuss rationale?. In 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 357–369.Google ScholarCross Ref
- Deeksha Arya, Wenting Wang, Jin LC Guo, and Jinghui Cheng. 2019. Analysis and detection of information types of open source software issue discussions. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 454–464.Google ScholarDigital Library
- Eduard C Groen, Norbert Seyff, Raian Ali, Fabiano Dalpiaz, Joerg Doerr, Emitza Guzman, Mahmood Hosseini, Jordi Marco, Marc Oriol, Anna Perini, et al. 2017. The crowd in requirements engineering: The landscape and challenges. IEEE software 34, 2 (2017), 44–52.Google Scholar
- Jonathan K Kummerfeld, Sai R Gouravajhala, Joseph Peper, Vignesh Athreya, Chulaka Gunasekara, Jatin Ganhotra, Siva Sankalp Patel, Lazaros Polymenakos, and Walter S Lasecki. 2018. A large-scale corpus for conversation disentanglement. arXiv preprint arXiv:1810.11118(2018).Google Scholar
- Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton. 2013. Speech recognition with deep recurrent neural networks. In 2013 IEEE international conference on acoustics, speech and signal processing. Ieee, 6645–6649.Google ScholarCross Ref
- Shengyi Pan, Lingfeng Bao, Xiaoxue Ren, Xin Xia, David Lo, and Shanping Li. 2021. Automating developer chat mining. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 854–866.Google ScholarDigital Library
- Xinbei Ma, Zhuosheng Zhang, and Hai Zhao. 2022. Structural Characterization for Dialogue Disentanglement. arXiv preprint arXiv:2110.08018(2022).Google Scholar
- Yuan Meng, Xuhao Pan, Jun Chang, and Yue Wang. 2023. RGAT: A Deeper Look into Syntactic Dependency Information for Coreference Resolution. In 2023 International Joint Conference on Neural Networks (IJCNN). 1–8. DOI: http://dx.doi.org/10.1109/IJCNN54540.2023.10191577Google ScholarCross Ref
- Tong Zhao, Junjie Peng, Yansong Huang, Lan Wang, Huiran Zhang, and Zesu Cai. 2023. A graph convolution-based heterogeneous fusion network for multimodal sentiment analysis. Applied Intelligence (11 2023), 1–14. DOI: http://dx.doi.org/10.1007/s10489-023-05151-wGoogle ScholarDigital Library
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.Google ScholarDigital Library
- Shafiq Joty, Alberto Barrón-Cedeño, Giovanni Da San Martino, Simone Filice, Lluís Màrquez, Alessandro Moschitti, and Preslav Nakov. 2015. Global Thread-level Inference for Comment Classification in Community Question Answering. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lluís Màrquez, Chris Callison-Burch, and Jian Su (Eds.). Association for Computational Linguistics, Lisbon, Portugal, 573–578. DOI: http://dx.doi.org/10.18653/v1/D15-1068Google ScholarCross Ref
- Ruoyao Yang, Wanying Xie, Chunhua Liu, and Dong Yu. 2019. BLCU_NLP at SemEval-2019 Task 7: An Inference Chain-based GPT Model for Rumour Evaluation. In Proceedings of the 13th International Workshop on Semantic Evaluation, Jonathan May, Ekaterina Shutova, Aurelie Herbelot, Xiaodan Zhu, Marianna Apidianaki, and Saif M. Mohammad (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, USA, 1090–1096. DOI: http://dx.doi.org/10.18653/v1/S19-2191Google ScholarCross Ref
- Preetha Chatterjee, Kostadin Damevski, Nicholas A. Kraft, and Lori Pollock. 2021. Automatically Identifying the Quality of Developer Chats for Post Hoc Use. ACM Trans. Softw. Eng. Methodol. 30, 4, Article 48(jul 2021), 28 pages. DOI: http://dx.doi.org/10.1145/3450503Google ScholarDigital Library
- Marwa Tolba, Salima Ouadfel, and Souham Meshoul. 2021. Hybrid Ensemble Approaches to Online Harassment Detection in Highly Imbalanced Data. Expert Syst. Appl. 175, C (aug 2021), 13. DOI: http://dx.doi.org/10.1016/j.eswa.2021.114751Google ScholarDigital Library
- Jonathan Herzig, Guy Feigenblat, Michal Shmueli-Scheuer, David Konopnicki, and Anat Rafaeli. 2016. Predicting Customer Satisfaction in Customer Support Conversations in Social Media Using Affective Features. In Proceedings of the 2016 Conference on User Modeling Adaptation and Personalization (UMAP ’16). Association for Computing Machinery, New York, NY, USA, 115–119. DOI: http://dx.doi.org/10.1145/2930238.2930285Google ScholarDigital Library
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805(2018).Google Scholar
- 2023. Bert-small on huggingface. https://huggingface.co/google/bert_uncased_L-4_H-256_A-4. (2023).Google Scholar
- Matthew E Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 1802. Deep contextualized word representations. CoRR abs/1802.05365 (2018). arXiv preprint arXiv:1802.05365(1802).Google Scholar
- Sumit Chopra, Raia Hadsell, and Yann LeCun. 2005. Learning a similarity metric discriminatively, with application to face verification. In 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), Vol. 1. IEEE, 539–546.Google Scholar
- Mohamed Chiny, Omar Bencharef, Moulay Youssef Hadi, and Younes Chihab. 2021. A client-centric evaluation system to evaluate guest’s satisfaction on AirBNB using machine learning and NLP. Applied Computational Intelligence and Soft Computing 2021 (2021), 1–14.Google ScholarDigital Library
- Qizhe Xie, Zihang Dai, Eduard Hovy, Thang Luong, and Quoc Le. 2020. Unsupervised data augmentation for consistency training. Advances in neural information processing systems 33 (2020), 6256–6268.Google Scholar
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).Google Scholar
- Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al. 2018. Improving language understanding by generative pre-training. (2018).Google Scholar
- Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In International conference on machine learning. PMLR, 1188–1196.Google Scholar
- Kaitlyn Zhou, Kawin Ethayarajh, Dallas Card, and Dan Jurafsky. 2022. Problems with Cosine as a Measure of Embedding Similarity for High Frequency Words. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (Eds.). Association for Computational Linguistics, Dublin, Ireland, 401–423. DOI: http://dx.doi.org/10.18653/v1/2022.acl-short.45Google ScholarCross Ref
- Zeming Dong, Qiang Hu, Yuejun Guo, Zhenya Zhang, Maxime Cordy, Mike Papadakis, Yves Le Traon, and Jianjun Zhao. 2023. Boosting Source Code Learning with Data Augmentation: An Empirical Study. arXiv preprint arXiv:2303.06808(2023).Google Scholar
- Paige Rodeghero, Siyuan Jiang, Ameer Armaly, and Collin McMillan. 2017. Detecting user story information in developer-client conversations to generate extractive summaries. In 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). IEEE, 49–59.Google ScholarDigital Library
- Andrew Wood, Paige Rodeghero, Ameer Armaly, and Collin McMillan. 2018. Detecting speech act types in developer question/answer conversations during bug repair. In Proceedings of the 2018 26th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering. 491–502.Google ScholarDigital Library
- Sarah Rastkar, Gail C Murphy, and Gabriel Murray. 2014. Automatic summarization of bug reports. IEEE Transactions on Software Engineering 40, 4 (2014), 366–380.Google ScholarDigital Library
- 2023. Angular chatroom on gitter. https://gitter.im/angular/angular. (2023).Google Scholar
- 2023. Deeplearning4j chatroom on gitter. https://gitter.im/eclipse/deeplearning4j. (2023).Google Scholar
- 2023. Spring-boot chatroom on gitter. https://gitter.im/spring-projects/spring-boot. (2023).Google Scholar
- 2023. Gitter developer page. https://developer.gitter.im/. (2023).Google Scholar
- Andrea Di Sorbo, Sebastiano Panichella, Corrado A Visaggio, Massimiliano Di Penta, Gerardo Canfora, and Harald C Gall. 2015. Development emails content analyzer: Intention mining in developer discussions (T). In 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 12–23.Google ScholarDigital Library
- Sebastiano Panichella, Andrea Di Sorbo, Emitza Guzman, Corrado A Visaggio, Gerardo Canfora, and Harald C Gall. 2015. How can i improve my app? classifying user reviews for software maintenance and evolution. In 2015 IEEE international conference on software maintenance and evolution (ICSME). IEEE, 281–290.Google ScholarDigital Library
- Donna Spencer. 2009. Card sorting: Designing usable categories. Rosenfeld Media.Google Scholar
- Jacob Cohen. 1960. A coefficient of agreement for nominal scales. Educational and psychological measurement 20, 1 (1960), 37–46.Google Scholar
- Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems 30 (2017).Google ScholarDigital Library
- Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Hérve Jégou, and Tomas Mikolov. 2016. Fasttext. zip: Compressing text classification models. arXiv preprint arXiv:1612.03651(2016).Google Scholar
- Qiao Huang, Xin Xia, David Lo, and Gail C Murphy. 2018. Automating intention mining. IEEE Transactions on Software Engineering 46, 10 (2018), 1098–1119.Google ScholarCross Ref
- Allen Institute for Artificial Intelligence. 2023. AllenNLP. https://allennlp.org/. (2023).Google Scholar
- Facebook. 2023. PyTorch. https://pytorch.org/. (2023).Google Scholar
- 2023. Transformers. https://huggingface.co/. (2023).Google Scholar
- Foyzur Rahman and Premkumar Devanbu. 2013. How, and Why, Process Metrics Are Better. In Proceedings of the 2013 International Conference on Software Engineering (ICSE ’13). IEEE Press, 432–441.Google ScholarDigital Library
- Tian Jiang, Lin Tan, and Sunghun Kim. 2013. Personalized defect prediction. In 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE). 279–289. DOI: http://dx.doi.org/10.1109/ASE.2013.6693087Google ScholarDigital Library
- Jaechang Nam, Sinno Jialin Pan, and Sunghun Kim. 2013. Transfer defect learning. In 2013 35th International Conference on Software Engineering (ICSE). 382–391. DOI: http://dx.doi.org/10.1109/ICSE.2013.6606584Google ScholarCross Ref
- Tim Menzies, Zach Milton, Burak Turhan, Bojan Cukic, Yue Jiang, and Ayse Basar Bener. 2010. Defect prediction from static code features: current results, limitations, new approaches. Automated Software Engineering 17 (2010), 375–407. https://api.semanticscholar.org/CorpusID:2782280Google ScholarDigital Library
- Emad Shihab, Zhen Ming Jiang, and Ahmed E Hassan. 2009. Studying the use of developer IRC meetings in open source projects. In 2009 IEEE International Conference on Software Maintenance. IEEE, 147–156.Google ScholarCross Ref
- Rana Alkadhi, Teodora Lata, Emitza Guzmany, and Bernd Bruegge. 2017. Rationale in development chat messages: an exploratory study. In 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR). IEEE, 436–446.Google ScholarDigital Library
- Anna Glazkova. 2020. A comparison of synthetic oversampling methods for multi-class text classification. arXiv preprint arXiv:2008.04636(2020).Google Scholar
- Jason Wei and Kai Zou. 2019. Eda: Easy data augmentation techniques for boosting performance on text classification tasks. arXiv preprint arXiv:1901.11196(2019).Google Scholar
- Claude Coulombe. 2018. Text data augmentation made simple by leveraging nlp cloud apis. arXiv preprint arXiv:1812.04718(2018).Google Scholar
- Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 16 (2002), 321–357.Google ScholarCross Ref
- Shikai Guo, Jian Dong, Hui Li, and Jiahui Wang. 2021. Software defect prediction with imbalanced distribution by radius-synthetic minority over-sampling technique. Journal of Software: Evolution and Process 33, 7 (2021), e2362.Google ScholarDigital Library
Index Terms
- Analyzing and Detecting Information Types of Developer Live Chat Threads
Recommendations
Deep Chit-Chat: Deep Learning for Chatbots
SIGIR'19: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information RetrievalThe tutorial is based on our long-term research on open domain conversation, rich hands-on experience on development of Microsoft XiaoIce, and our previous tutorials on EMNLP 2018 and the Web Conference 2019. It starts from a summary of recent ...
Detecting speech act types in developer question/answer conversations during bug repair
ESEC/FSE 2018: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software EngineeringThis paper targets the problem of speech act detection in conversations about bug repair. We conduct a ``Wizard of Oz'' experiment with 30 professional programmers, in which the programmers fix bugs for two hours, and use a simulated virtual assistant ...
Deep Chit-Chat: Deep Learning for Chatbots
WWW '19: Companion Proceedings of The 2019 World Wide Web ConferenceThe tutorial is based on our long-term research on open domain conversation and rich hands-on experience on development of Microsoft XiaoIce. We will summarize the recent achievements made by both academia and industry on chatbots, and give a thorough ...
Comments