short-paper

How much SPACE do metrics have in GenAI assisted software development?

Authors:
Samarth Sikand

Accenture Labs, India

Accenture Labs, India

0009-0009-3785-3297
View Profile

,
Kanchanjot Kaur Phokela

Accenture Labs, India

Accenture Labs, India

0009-0003-0208-9803
View Profile

,
Vibhu Saujanya Sharma

Accenture Labs, India

Accenture Labs, India

0009-0002-2352-2090
View Profile

,
Kapil Singi

Accenture Labs, India

Accenture Labs, India

0009-0008-5785-9509
View Profile

,
Vikrant Kaulgud

Accenture Labs, India

Accenture Labs, India

0009-0002-9774-6265
View Profile

,
Teresa Tung

Accenture, USA

Accenture, USA

0009-0005-0283-6404
View Profile

,
Pragya Sharma

Accenture, India

Accenture, India

0009-0006-4294-9785
View Profile

,
Adam P. Burden

Accenture, USA

Accenture, USA

0009-0004-3627-588X
View Profile

ISEC '24: Proceedings of the 17th Innovations in Software Engineering ConferenceFebruary 2024Article No.: 14Pages 1–5https://doi.org/10.1145/3641399.3641419

Published:22 February 2024Publication History

ISEC '24: Proceedings of the 17th Innovations in Software Engineering Conference

Pages 1–5

ABSTRACT

Large Language Models (LLMs) are revolutionizing the way a developer creates software by replacing code with natural language prompts as primary drivers. While many initial assessments of such LLMs suggest that it helps with developer productivity, other research studies have also pointed out areas in the Software Development Life Cycle(SDLC) and developer experience where such tools fail miserably. Currently, there exist many studies dedicated to evaluation of LLM-based AI-assisted software tools but there lacks a standardization of studies and metrics which may prove to be a hindrance to adoption of metrics and reproducible studies. The primary objective of this survey is to assess the recent user studies and surveys, aimed at evaluating different aspects of developer’s experience of using code-based LLMs, and highlight any existing gaps among them. We have leveraged the SPACE framework to enumerate and categorise metrics from studies conducting some form of controlled user experiments. In Generative AI assisted SDLC, the developer’s experience should encompass the ability to perform the in-hand task efficiently and effectively, with minimal friction using these LLM tools. Our exploration has led to some critical insights regarding complete absence of user studies in Collaborative aspects of teams, bias towards certain LLM models & metrics and lack of diversity in metrics within productivity dimensions. We also propose some recommendations to the research community which will help bring some conformity in the evaluation of such LLMs.

References

Naser Al Madi. 2022. How readable is model-generated code? examining readability and visual inspection of github copilot. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. 1–5.Google ScholarDigital Library
Rohan Anil, Andrew M Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, 2023. Palm 2 technical report. arXiv preprint arXiv:2305.10403 (2023).Google Scholar
Shraddha Barke, Michael B James, and Nadia Polikarpova. 2022. Grounded copilot: How programmers interact with code-generating models.(2022). CoRR arXiv 2206 (2022).Google Scholar
Christian Bird, Denae Ford, Thomas Zimmermann, Nicole Forsgren, Eirini Kalliamvakou, Travis Lowdermilk, and Idan Gazit. 2023. Taking Flight with Copilot. Commun. ACM 66, 6 (2023), 56–62.Google ScholarDigital Library
Victor Dibia, Adam Fourney, Gagan Bansal, Forough Poursabzi-Sangdeh, Han Liu, and Saleema Amershi. 2022. Aligning Offline Metrics and Human Judgments of Value of AI-Pair Programmers. arXiv preprint arXiv:2210.16494 (2022).Google Scholar
Mikhail Evtikhiev, Egor Bogomolov, Yaroslav Sokolov, and Timofey Bryksin. 2023. Out of the bleu: how should we assess quality of the code generation models?Journal of Systems and Software 203 (2023), 111741.Google Scholar
Nicole Forsgren, Margaret-Anne Storey, Chandra Maddila, Thomas Zimmermann, Brian Houck, and Jenna Butler. 2021. The SPACE of Developer Productivity: There’s more to it than you think.Queue 19, 1 (2021), 20–48.Google Scholar
Saki Imai. 2022. Is github copilot a substitute for human pair-programming? an empirical study. In Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings. 319–321.Google ScholarDigital Library
Ellen Jiang, Edwin Toh, Alejandra Molina, Kristen Olson, Claire Kayacik, Aaron Donsbach, Carrie J Cai, and Michael Terry. 2022. Discovering the syntax and strategies of natural language programming with generative language models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–19.Google ScholarDigital Library
Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, 2023. StarCoder: may the source be with you!arXiv preprint arXiv:2305.06161 (2023).Google Scholar
OpenAI. 2023. GPT-4 Technical Report. arxiv:2303.08774 [cs.CL]Google Scholar
Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jérémy Rapin, 2023. Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950 (2023).Google Scholar
Gustavo Sandoval, Hammond Pearce, Teo Nys, Ramesh Karri, Brendan Dolan-Gavitt, and Siddharth Garg. 2022. Security implications of large language model code assistants: A user study. arXiv preprint arXiv:2208.09727 (2022).Google Scholar
Jiao Sun, Q Vera Liao, Michael Muller, Mayank Agarwal, Stephanie Houde, Kartik Talamadupula, and Justin D Weisz. 2022. Investigating explainability of generative AI for code through scenario-based design. In 27th International Conference on Intelligent User Interfaces. 212–228.Google ScholarDigital Library
Priyan Vaithilingam, Tianyi Zhang, and Elena L Glassman. 2022. Expectation vs. experience: Evaluating the usability of code generation tools powered by large language models. In Chi conference on human factors in computing systems extended abstracts. 1–7.Google ScholarDigital Library
Helena Vasconcelos, Gagan Bansal, Adam Fourney, Q Vera Liao, and Jennifer Wortman Vaughan. 2022. Generation probabilities are not enough: improving error highlighting for AI code suggestions. In Virtual Workshop on Human-Centered AI Workshop at NeurIPS (HCAI@ NeurIPS’22). Virtual Event, USA. 1–4.Google Scholar
Frank F Xu, Bogdan Vasilescu, and Graham Neubig. 2022. In-ide code generation from natural language: Promise and challenges. ACM Transactions on Software Engineering and Methodology (TOSEM) 31, 2 (2022), 1–47.Google ScholarDigital Library
Daoguang Zan, Bei Chen, Fengji Zhang, Dianjie Lu, Bingchao Wu, Bei Guan, Wang Yongji, and Jian-Guang Lou. 2023. Large Language Models Meet NL2Code: A Survey. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Toronto, Canada, 7443–7464. https://doi.org/10.18653/v1/2023.acl-long.411Google ScholarCross Ref
Albert Ziegler, Eirini Kalliamvakou, X Alice Li, Andrew Rice, Devon Rifkin, Shawn Simister, Ganesh Sittampalam, and Edward Aftandilian. 2022. Productivity assessment of neural code completion. In Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming. 21–29.Google ScholarDigital Library

Index Terms

How much SPACE do metrics have in GenAI assisted software development?
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Natural language generation

Recommendations

Metrics to Evaluate & Monitor Agile Based Software Development Projects - A Fuzzy Logic Approach
IWSM-MENSURA '12: Proceedings of the 2012 Joint Conference of the 22nd International Workshop on Software Measurement and the 2012 Seventh International Conference on Software Process and Product Measurement

When the initial requirements of a software project are not so consolidated and there is a gap from what the user is able to express and what actually is needed, it seems obvious that the traditional way of "waterfall" developing approach is not so much ...
Read More
Using metrics in Agile and Lean Software Development - A systematic literature review of industrial studies

ContextSoftware industry has widely adopted Agile software development methods. Agile literature proposes a few key metrics but little is known of the actual metrics use in Agile teams. ObjectiveThe objective of this paper is to increase knowledge of ...
Read More
Entropy Metrics for Agile Development Processes
ISSREW '12: Proceedings of the 2012 IEEE 23rd International Symposium on Software Reliability Engineering Workshops

Agile development processes are preferred by most of the software industry over plan-driven processes in recent years. The transition from plan-driven to agile processes has surfaced a problem: How to adopt metrics that will provide information about ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ISEC '24: Proceedings of the 17th Innovations in Software Engineering Conference
February 2024
144 pages
ISBN:9798400717673
DOI:10.1145/3641399
Editors:
Sujit Kumar Chakrabarti
IIIT Bangalore, India
,
Aseem Rastogi
Microsoft Research, India
,
Sudipto Ghosh
University of Colorado, USA
,
Raghavan Komondoor
IISc, India
,
Raveendra Kumar Medicherla
TCS Research, India
,
Lov Kumar
NIT Kurukshetra, India
,
Sangharatna Godboley
NIT Warangal
Copyright © 2024 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 February 2024
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Developer Productivity
Generative AI
Metrics
SDLC
Software
Qualifiers
- short-paper
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate76of315submissions,24%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 66
  Total Downloads
- Downloads (Last 12 months)66
- Downloads (Last 6 weeks)52
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

How much SPACE do metrics have in GenAI assisted software development?

ISEC '24: Proceedings of the 17th Innovations in Software Engineering Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Metrics to Evaluate & Monitor Agile Based Software Development Projects - A Fuzzy Logic Approach

Using metrics in Agile and Lean Software Development - A systematic literature review of industrial studies

Entropy Metrics for Agile Development Processes

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

How much SPACE do metrics have in GenAI assisted software development?

ISEC '24: Proceedings of the 17th Innovations in Software Engineering Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Metrics to Evaluate & Monitor Agile Based Software Development Projects - A Fuzzy Logic Approach

Using metrics in Agile and Lean Software Development - A systematic literature review of industrial studies

Entropy Metrics for Agile Development Processes

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media