An Empirical Study of Web Services Topics in Web Developer Discussions on Stack Overflow

Web Services (WSs) are gaining worldwide popularity due to reliable and fast intercommunication services for the development of web and mobile applications. WSs are provided to client application developers through web Application Programming Interfaces (APIs), such as YouTube API, Twitter API, Facebook API, etc. Due to the popularity of WSs, the developers frequently discuss various WSs-based application’ issues on online forums, such as Stack Overflow (SO). This study aims to highlight the problems faced by client developers in the development process of WSs-based applications using the dataset of SO. The comprehension of developers’ conversations on SO can give insight into the frequency, difficulty, and popularity of different WSs-related problems of developers. We downloaded 12,746 posts from SO relevant to WSs-related issues for this article. We used the topic modeling technique (LDA) to extract various topics from the SO dataset. The topics are labeled and organized into categories and sub-categories according to relationships among them. The difficulty and popularity of each topic have been analyzed. Our investigation yield several findings. First, developers focus on six topics related to WSs on SO: Client APIs development, Data Processing, Web services Authorization, Framework Support, Web APIs, and Mobile Applications. Secondly, the advantages and disadvantages of web applications topic (Fused_Popularity=0.39), from the Clients APIs development category have the highest prevalence, followed by Database (DB) and Data Processing in Applications topic (Fused_Popularity=0.38) from the Data Processing category. Third, most WSs-related topics in all categories are evolving promptly on SO, i.e., new questions are added daily about WSs development, deployment, and authorization. Fourth, the questions of type “how” are primarily asked in Framework support, Client APIs development, and Web APIs categories. Although, many questions in other categories are of the kind “What”. It is also observed that WSs developers not only used SO to ask How and What types of questions but they also used SO to ask information-seeking questions (i.e., in Data processing and Client APIs development categories). Fifth, the topics relevant to Web APIs (Fused_Popularity=10.8) and Client API Development ((Fused_Popularity=9.35) categories of WSs are very popular on SO. Sixth, the questions relevant to the Web APIs (Fused_Difficulty =3) & Client APIs development (Fused_Difficulty=2.25) categories are more difficult than the other four categories. The results of our research may be helpful for the following WSs stakeholders: WSs Client application developers, WSs Educators, and WSs researchers. The WSs Educators and investigators can get more knowledge of new methods and discover novel techniques to make challenging WSs topics easy to understand. WSs framework developers can utilize our extracted WSs topics and categories to know the preferences of WSs developers that may support them in upgrading existing frameworks or developing new ones.

empirical knowledge about the topics related to problems of client developers and the development of tools for analyzing the such type of information.
In this article, we aim to discover the problems that client developers face due to the use of WSs by analyzing what they are discussing on popular code-oriented discussion forum SO [16]. Our study investigates the WS-related posts on SO to identify topics relevant to the WSs based client applications. We analyze and use over 8.7 K WSs-related posts on SO and use topic modeling techniques [17] with various quantitative, statistical, and manual analyses to understand WSs-related discussion topics. In particular, our research focuses on the following research questions: RQ1. What topics are discussed by the developers to resolve the web services-related problems on SO?
We found that developers ask questions related to 36 main WSs topics that are grouped into six categories. The categories are Client APIs development, Data Processing, Web services Authorization, Framework Support, Web APIs, and Mobile Applications. The most popular questions are on WSs deployment, architecture, and integration issues of applications.

RQ2. How do web services related issues evolve with time on SO?
According to the findings of RQ1, it is evident that the developers discuss a variety of issues related to WSs on SO. Furthermore, WS is a paradigm that is evolving with innovative structures, methods, and framework support, they are applied across the Six high-level proposed categories. The developers are also interested in topics and categories that evolve with time. Therefore, we are curious to find how developers' conversations evolve about WSs topics on SO over time. We calculated each category's absolute impact by finding the average number of new questions added to a category per month. We observe that the absolute impact of all categories of WSs-Topics is rising on SO since 2008, possibly due to a variety of reasons, such as the availability of APIs for diverse types of applications, using multiple APIs in any application to attract more users and interoperability. In addition, we also calculated the relative impact of all WSs topics-categories, by finding the number of new posts that are added to a specific category per month as compared to other categories. We observe significant improvements in the number of posts on Web APIs deployment issues that are asked each month as compared to other categories.

RQ3. What types of questions are discussed by the client developers about web services?
Developers primarily use SO as a source of learning for implementing procedures, practical examples and debugging. This shows a need for better documentation for Web APIs, providing real scenarios and more information for web and mobile application developers. We can look at the SO discussion to learn more about the challenges of client developers in using WSs. To understand these difficulties, we differentiate between the posts that developers are raising in various WSs topics. A significant sample of questions (1,765) is manually labeled from the dataset to address this RQ. We followed Abdellatif et al. [18] approach used for SO posts to categorize questions into six categories: How, What, Why, informationseeking, information-giving, and Other.

RQ4. How are the Difficulty and Popularity of web services problems changing on SO?
It is observed that the most difficult topics relate to Web APIs. On the other hand, posts related to the traditional ''Client APIs development'' category ''software development'' and ''Advantages and disadvantages of Web Applications'' are more frequently answered by the developers on SO. Although in our study, we are not able to find any significant correlation between the popularity and difficulty of the WSs topics.
In addition to the identified WSs topics from SO, we analyzed the evolution of the WSs topics on SO and found that the WSs-related discussions have increased substantially since 2008. We also compared the WSs topics to other mature SE fields (e.g., mobile and security) in terms of popularity and difficulty. Our results show that the WSs community needs more effort to compete with the maturity level of other domains (i.e., mobile and security). Our findings reveal that WSs providers need to improve their current documentation and integration with web and mobile applications.
Moreover, our study findings give research directions on the most popular and challenging WSs topics. Client application developers can use the popularity of topics to know about emerging WSs. WS educators and investigators can propose new architectures, methods, and tools to simplify complex WSs topics.
The related work of our study is presented in Section II. The study design is described in Section III. Section IV presents the results of our empirical analysis. The implications of our research are discussed in Section V. The threats to validity are addressed in Section VI, and Section VII concludes the findings of this study.
Zhou et al. [57] investigated defects in industrial microservice systems and debugging procedures using 22 typical situations. They listed six types of defects, such as WS communication defects, functional flaws, etc. Belkhir et al. [61] investigated the usage of WSs in various applications of Android. They studied the aspects of Android RESTful APIs in the existing studies and came up with a list of seven best practices.
The effect of WSs evolution on various client applications was explored by Espinha et al. [49] in 2014. They identified multiple concerns due to WSs evolution (busting changes and volatility) and provided suggestions for WSs providers to simplify the evolution tasks for client developers. They evaluated the commits of ten client applications on GitHub that used four prevalent WSs, such as Google Maps, Twitter, etc. By assessing 67 endpoints of 14 WSs, Hosono et al. [48] conducted an experimental investigation on the validity of WSs documentation. They found four types of disparity between documentation and endpoints. Li et al. [51] presented an empirical analysis of cloud WSs-related issues in commercial cloud platforms using various discussion forums. They compiled a list of problems and flaws connected to cloud WSs.
Oumaziz et al. [62] conducted empirical research to comprehend how RESTful services are utilized by Android applications by using 15 famous WSs and 500 software apps. The best practices for Android developers were determined by administrating an online survey. They observed that Android developers tended to utilize a specialized service library of various service providers. They also highlighted many critical aspects of service libraries, such as consistent terminology or comprehensive documentation. Rodriguez et al. [63] studied whether the RESTful architectural style concepts and rules were implemented in reality. They investigated RESTful HTTP requests from more than 78GB of HTTP data gathered by Telecom Italia (A major internet provider for mobile).
Rapoport et al. [64] evaluated 20 Android applications' web queries and introduced Stringoid, a tool for scanning string concatenation processes. Neumann et al. [65] investigated 26 technical aspects of 500 RESTful WSs, such as the degree of conformity with REST architectural principles and best practices. They suggested creating good quality services based on their findings of many technology trends, such as broad JSON support. Over the course of 11 months of longitudinal research, Cummaudo et al. [56] investigated three significant computer vision services. They discovered that the WSs act inconsistently over time and that the documentation has inconsistencies in the approach descriptions.
By using a variety of WSs or industrial scenarios, the preceding works mainly analyze 1) client developer challenges while using WSs in Web and mobile applications development and 2) certain types of WSs concerns (e.g., evolution in WSs, interoperability, usage, etc.). However, the Venkatesh et al. [60] raised challenges in their study regarding WSs-related issues, such as a collection of keywords that were unable to mine concrete problems. Furthermore, neither of these earlier studies explicitly mentioned developers' challenges in creating apps that employed WSs. There is still a lack of studies on the client-side issues of the thousands of WSs provided by various service providers on the World Wide Web. In this study, we investigate the problems of client application developers by using the questions posted by web developers on the SO.

A. STACK OVERFLOW-BASED EMPIRICAL STUDIES
The massive collection of SO questions and answers covers a wide range of software development topics. There are many primary studies in the literature that are conducted by using the dataset of SO [47], [61], [66], [67], [68], [69], [70], [71]. Wan et al. [66] applied the LDA model on the SO questions relevant to block-chain to examine the discussed issues. Through a qualitative examination of SO questions, Nashehi et al. [68] explored the characteristics that create good code examples. Wang et al. [70] investigated the associations with-in developers' discussions using the distributions of questions and their responses in the SO dataset. In this paper, we investigate the issues of client developers while using various WSs using the SO dataset. We also use a sampling set of SO inquiries to discover the categories of issues associated with employing WSs.

B. TOPIC MODELING
Topic modeling is a set of methodologies, tools, and procedures for organizing, comprehending, and summarizing massive text-based data [72]. It helps in the discovery of hidden relationships between various patterns related to specific topics in the collection of text-based content [73]. It also highlights issues, which are the collection of words that best describe the content included in the texts [74]. In the 1980s, generative probabilistic modeling gave rise to topic modeling [72]. This model found the relationship between observable and unseen factors and the probabilistic link among observations that help to generate relevant and representative topics from any dataset [75]. The TF-IDF reduction strategy was the first approach proposed in topic modeling, commonly used for feature extraction [76]. Deerwester et al. [77] proposed a dimensionality-reduction method called Latent Semantic Indexing or Latent Semantic Analysis (LSI/LSA) as an alternative to TF-IDF. LSI used singular value decomposition to factorize the TF-IDF matrix.
Textual documents are greatly benefited by topic modeling. Furthermore, topic modeling has been utilized to analyze environmental data [78], bioinformatics data [72], and social science data [79]. It was used to classify datasets based on similar sequence structures [80], Group social media customers with equivalent posts' content [79], and classify genomic data based on related sequence structures [81].

C. USE OF TOPIC MODELING IN SE RESEARCH
Unstructured software datasets were increasingly mined using topic modeling [82]. It has been used in various SE tasks, such as code comprehension and multiple domains, ranging from OOP and the Internet of Things to WSs. Software-Feature-Network was used to discover semantic characteristics in OOP applications at the class level [83]. In contrast, topic-based information extraction from program codes using LDA gave insight into software systems [84]. Feature recognition was used to find relationships between requirements documentation and the source code [85].
Poshyvanyk et al. [85] integrated formal concept analysis with the LSI model to link ideas in textual change requests with relevant portions of a source code. Dit et al. [86] used dynamic analysis to locate features employing Web-mining, data fusion, and LSI algorithms. Nie et al. [87] exploited the LDA model to discover exciting aspects of source code by measuring the topic cohesiveness based on a program dependency network.
Maintenance challenges in software applications were also detected using a statistical topic modeling method [88]. Hu et al. [89] used the association strength of data to forecast the defect-proneness of source code. Xie et al. [90] proposed a tool (Dretom) to help developers for fixing issues in program codes. The Dretom used topic modeling and the developers' experiences to solve various code issues. Zhang et al. [91] integrated the topic modeling technique with the developer's responsibility as a fault indicator to identify developers' primary knowledge areas. Yang et al. [91] proposed a method for recommending bug solutions based on similarity across malware topics. Recently, topic modeling was utilized to comprehend program logging [92]. The other applications of topic modeling are feature location and concept extraction, [93], [94], traceability link retrieval [95], [96], history of source code [89], [97], [98], searching of code components [99], refactoring [100] program faults explanation [88], and dealing maintenance processes [101].
In another approach, Sun et al. [74] addressed the usage problems of the users due to web APIs evolution by using the topic modeling techniques (LDA). They performed an empirical study on 32 web APIs of 7 different types using 92,471 opinions collected from developer discussion forums. Uddin et al. [102] conducted a case study on an opinionated benchmarked dataset of 4522 sentences from 1338 Stack Overflow posts to analyze the different aspects of web APIs. They developed and deployed a OPINER tool 2 that directly fetched opinionated data from discussion forums. Uddin et al. [103] also investigated complex web APIs problems, such as selecting the right API from many competing APIs, availability of learning resources, and usability.
The prior work demonstrates that topic modeling is essential in SE research and text-based datasets that have motivated us to apply topic modeling to analyze WSs-related discussions [101], [104], [74]. This study uses topic modeling to investigate SO datasets relevant to WSs. The goal is to construct abstractions of the developers' discussion about WSs on SO in the form of topic sets. We apply the LDA model [105] on the SO dataset to discover topics. It is a probabilistic topic modeling technique for extracting topics in software repositories, it is commonly utilized in SE research [74], [101], [104]. Other methods Probabilistic Latent Semantic Analysis, provide topics that are less interpretable and convoluted as compared to topics generated from LDA model [75], [106].

III. DESIGN APPROACH
Section III-A describes our data collection strategy for obtaining WSs relevant posts on SO. Section III-B explains the pre-processing mechanism of our collected dataset and the application of Genism and MALLET LDA Algorithms on relevant datasets to identify the problems of the client application developers related to WSs.

A. DATA COLLECTION
To collect the posts related to WSs on SO, we performed three steps: i. Downloaded SO Dataset ii. Identified the WSs tag-set inside the dataset, and iii. Identified WSs-related posts within the dataset based on the identified WSs tag-set. The detail of the steps is explained below.

1) DOWNLOAD SO DATASET
As SO is among the famous web forums for programmers to address various issues relevant to software development tasks [103], [105]. We used the data dump of Stack Exchange 3 in March 2022, which was the most recent dataset accessible at the time of our investigation. There were 2,33,785 Questions and answers in the dataset, which spans 14 years from 2008 to March 2022. There are around 60,428 questions and 1,73,356 responses in the SO dataset. The detail of the dataset is given in Table 1.
The following information was contained in each dataset entry: 1. Content, which included written and coding examples 2. Dates of creation and upgradation, 3. View count, favorite count, and score for each post 4. The posts' creator user ID 5. Tags assigned to the questions by the creator. The asker of a question can only approve the proposed solution to this question. A post might be attached with 1 to 5 tags. 3 https://archive.org/details/stackexchange 2) SEARCHING WSs TAG-SET All questions and answers in the SO were not related to the WSs. As a result, we needed to figure out the posts related to WSs. To identify WSs-related posts, we utilized tags assigned by the developers to each post in the dataset. We identified the tag set that might be used to identify all the posts connected to the WSs. Yang et al. [107] approach was followed to locate the WSs tag-set to collect the most relevant posts related to WSs. First, we identified five basic and most famous tags related to WSs on SO. These tags were denoted as B-tags. Secondly, we gathered all posts that were tagged with B-tags. We identified all the tags that had been assigned to these posts. B-tags were the set of 5 most basic relevant tags. We go through the above steps in detail below.

i. Basic WSs Tags Identification
It was observed that a large number of WSs-related posts were tagged with Web-services in the SO dataset. Instinctively, the Web-services tag should be used to name a large number of WSs-related entries on SO. In March 2022, we searched for posts tagged with the web-services using the SO search engine. It returned posts tagged with Web-services as well as a collection of 25 other tags that were relevant to these searched posts, such as REST, WCF, WSDL, JSON, etc. These tags usually appeared with the Web-services tag on SO. These 25 tags might be generally classified as tags relevant to the architecture of WSs, such as SOAP, REST, WCF, etc. Some WSs Tags were related to we follow works used in developing client applications, such as ASP.NET, Android, IOS, etc. Some Tags were relevant to the language and data format of WSs or APIs, e.g., PHP, JAVASCRIPT, Python, XML, JSON, etc. As a result, the following were considered Basic-tages (B-tags): 1. Web-services, 2. REST, 3. SOAP, 4. API, 5. Micro-services. These tags were fundamental WSs tags because web services, REST, SOAP, API, and Micro-services are the five most notable aspects of WSs relevant to the development of client applications.
The SO search engine highlighted the five B-tags related to WSs. After that, we started searching by identifying the questions on SO tagged with the keyword web-services. Stack Overflow does not provide the details of the tags searched by its search engine. However, we discovered discussions on Stack Exchange meta-sites in which people inquired about the details of relevant tags. That was a query that a developer raised, ''How does Stack Overflow suggest related tags?'' A similar question was asked by another client developer, such as ''What are these tags related to the Newest Questions page?''. As per responses to these questions, the related tags usually appeared together on SO. The SO has also provided API-endpoint Tab to search inter-related tags. It takes the name of a tag and gives a list of related tags. As we discussed in Section III-A, not all 25 tags were relevant to WSs, as one relevant tag was C# which only had generic C# programming discussion questions.
Furthermore, it was impossible to manually analyze all posts tagged as 'C#' to separate WSs-related posts unless the VOLUME 11, 2023 posts were also not tagged as WSs. It is also probable that some WSs developers asked WSs related Questions using tags other than the 'web-services' tag. It was evident that it was impossible to rely on the tags of the questions to collect all posts relevant to the WSs.
Consequently, we examined each of the 25 relevant tags, and in the beginning, we selected five fundamental tags; their names were web-services, REST, SOAP, API, and Microservices. After that, we used these five basic tags as the source in our tag-expansion approach, which we applied to the whole SO dataset. Previous studies have already used this approach for extracting posts relevant to different topics in various domains, such as big data [108], concurrency [109], Android applications [110], chatbot discussion dataset [18], etc.
ii. Collecting Final WSs Tag-set Subconsciously, there can be multiple tags related to WSs on SO posts that developers mostly use to discuss WSs' relevant issues. We considered the whole SO dataset dump as (W). By using the B-tags, we extracted all the posts related to the WSs. Then we gathered all the posts as SD (sub-dataset) tagged with any one of the tags in the B-Tags. Ultimately, we identified all WSs tags (WST) in the SD dataset used to address issues relevant to WSs. However, all the identified tags couldn't relate to WSs issues such as C#, PHP, etc. Therefore, we used the guidelines of previous works [107], [108] to remove the irrelevant tags from WST. The significance and relevance were calculated for each WST in SD to find the most relevant WSs Tags. The results showed that S = 0.3 and R = 0.01 gave maximum numbers of tags relevant to WSs that was consistent with prior studies' findings [107], [108], and [109]. We performed the steps below to extract a list of WSs tags using the significance and relevance values of each tag T using Python Notebook.
i. We extracted all tags that coexisted with five selected tags in the dataset (SD). This process provided us with 726 WSs tags. ii. We calculated the number of posts associated with each tag T in the SD and W datasets. We also calculated the significance and relevance of each tag for both datasets. It helped us to find a threshold value pair, to choose a subset of the 726 WSs tags (Results of only 5 WSs tags are given in Table 2, and complete results are given on online dataset link that are used to extract the most relevant posts about the concerns of developers due WSs. For example, for S = 0.25 and R = 0.001, we got 34 WSs tags. The numbers of tags given by our selected threshold were denoted as Recommended Tags. It was essential to determine that every tag in the ''Recommended the tags'' list should be relevant to the WSs. We manually analyzed it by reading the detail of these tags on SO, such as we did not appraise the tag 'Python' relevant to WSs. The tag wiki 4 (Stack Overflow 2022) defines this tag as, '' Python is a multi-paradigm, dynamically typed, multi-purpose programming language. It was designed to be quick to learn, understand, use, and enforce a clean and uniform syntax''. Similarly, we considered Twitter relevant to WSs: ''It is a microblogging service that lets users post short 'Tweets' of up to 280 characters. These can also be posted via its API''. Finally, the manual work at threshold (S = 0. 25  ii. We developed a final list by gathering the identified tags as relevant in all experiments. The identified list contained 54 WSs tags that were found relevant to WSs through our manual analysis. These tags covered

3) IDENTIFICATION OF WSs QUESTIONS USING SELECTED 54 TAGS
All questions on SO are attached with at least one (or more) of the 54 identified WSs tags were included in our final dataset.
As per previous studies [61], [108], [111], we also used identified tags and found a total number of 12,497 questions in our SO dataset. The duplicate posts were removed to eliminate noise and lower the dataset size. After removing the 3,924 duplicate posts with the help of python's NLTK library, the final SO dataset (FDS) size was changed to 8,573 questions. This final dataset was further used in the topic modeling to mine topics that were discussed by client developers on the SO discussion forum while using various WSs in software applications development. The word cloud in Figure 2 shows that the text in the FDS has a significant relationship with WSs.

B. TOPIC MODELING PROCESS
The following steps were performed to extract WSs topics from the posts in FDS: 1. Pre-processing of Posts in FDS, 2. Searching an optimum number of discussed topics, 3. Discovering the-related topics. The detail of the above steps is as under:

1) PRE-PROCESSING OF POSTS IN FDS
The questions in FDS were prepossessed to clean the noise with the help of the NLTK library of python and regular expressions. We used the noise cleaning process adopted by previous studies [107], [108], and [111]. We performed the following steps for this process. i. All the non-text blocks, such as code chunks in the code tags ''< code >< /code>'' and HTML tags such as ''< p >< /p > and < a >< /a >'' were removed with the help of Regular expressions. ii. The textual data contained different types of stop words such as ''a, an, and, on, this, etc.'' numerical, punctuation, special letters, symbol, etc. that were removed by using the NLTK 6 and MALLET [112] libraries in the python. These libraries provided a set of many stop words to remove from textual datasets. We added some new stop words such as edu, from, re, and use to the NLTK library to improve the quality of our dataset. It is a common practice in NLP textual processing to guarantee that the modeling process emphasizes the most revealing material. iii. We used the delimitation feature of NLTK to get the origins of words, which enhanced the circumstantial comprehension of textual data by improving analogy and stabilizing the divergence in the dataset. For example the words ''watching'', ''watched, and ''watcher'' are all lemmatized to root word ''watch''. We used delimitation of NLTK library instead of Porter stemming [113] in this approach. Porter stemming reduced the word to roots as ''configuration'', ''configured'', ''configure'' were all reduced to ''configur'' we preferred delimitation because it reduced the word to verb or noun rather than the root word, improving the comprehension of the dataset.

2) SEARCHING OPTIMAL NUMBER OF TOPICS
To find the topic from the dataset, we used Latent Dirichlet Allocation (LDA) algorithm [114] given by both MALLET [112] and Genism Libraries [115]. The preprocessed data of FDS was given as input to LDA Algorithms are defined in Genism and MALLET libraries. The LDA Algorithms of both libraries generated approximately the same lists of topics by combing the posts into K numbers of topics. We preferred the LDA algorithm of the MALLET library [112] because of its higher coherence score value as compared to the Gensim library [115]. We applied the typical procedure of Arun et al. [116] to find the optimum number of topics (K). This approach proposed that the coherence measurement affects the generation of the optimal number of topics from the textual dataset. LDA model with a higher coherence score might better represent the dataset. Our approach's coherence was measured using the c_v metric given by Roder et al. [117], which was part of the Gensim Python library [117]. This c_v metric was also used by Uddin et al. [102] to measure the coherence of words to discover topics. We performed the following steps to generate the optimal number of topics from the dataset.
i. The sentences in the FDS dataset were tokenized into a list of words by removing punctuations, unnecessary characters, etc. Gensim's simple_preprocess() function was used for this purpose [115]. The dataset after tokenization looked like as, '' ['as', 'software', 'engineer', 'who', 'should', 'be', 'following', 'on', 'twitter']''. ii. The bigrams and trigrams were developed from the selected dataset using the Python Genism library. The Bigrams are two words, and Trigrams are three words frequently occurring together in the document. This process increased the effectiveness of the LDA model. iii. Two main inputs were necessary to use the LDA topic model, i.e., dictionary (id2word) and corpus. We created a dictionary(id2word) and the corpus using the Gensim library [115]. It created a unique id for each word in the document. The developed corpus of our selected dataset was represented in the mapping of word_id and word_frequency. For example, (0, 1) means a word with id 0 occurs once in the dataset. The detailed process of implementing LAD models is given as under: a. Building Topics: In addition to the corpus and dictionary developed in the above steps, we provided the number of topics (K), alpha, and beta as hyperparameters that affect the sparsity of the extracted topics. Both α and β had default values of 1.0/num_topics as per Python Gensim docs [115]. The chunk size was the number of documents used in each training chunk. The parameter update_every determined how often the model parameters would be updated, and parameter passes were the total number of training passes. We generated the Gensim simple LDA Model and LDA Multicor models by passing all the parameters above. We got the topics from both models and saved them in separate Excel files.
(Excel files are available on the online dataset link) Model perplexity and topic coherence 7 provide a convenient measure to judge how good the extracted topic model is. The topic coherence score, in particular, has been more helpful in evaluating the quality of the LDA model. The Model perplexity value for Genism LDA Model is -7.79, which shows it is a good model (Lower perplexity indicates the model's goodness) [115]. The coherence score generated by Gensim's LDA model was 0.48. So it was observed that the LDA algorithm of the MALLET library [112] often provides better quality topics. It provided a good coherence score as compared to Gensim LDA Models that significantly affected the extraction of topics from textual datasets. b. We used the MALLET LDA Model 8 provided as a wrapper implemented in python. We passed our Dictionary, Corpus and the number of topics (K) developed in step (iii) to MALLET LDA as input. We saved the output in the LDA_Mallet CSV file (available in the online dataset). This model gave high coherence score = 0.56 as compared to Gensim's LDA model. It was observed that by changing the LDA algorithm, the coherence score improved from.037 to 0.556 which significantly affected the quality of topics generated from this process. c. The above MALLET LDA model was built with ten different topics where each topic was a combination of keywords, and each keyword contributed a certain weightage to the topic. We can see the keywords for each topic and the weightage (importance) of each keyword using lda_model.print_topics() functions of Python Library [115] as shown in Table 4 (only for the first six topics). d. Topics Interpretation: Top 20 keywords that contribute to the topic (0) are: 0.175*''create'' + 0.101*''type'' + 0.094*''entity. . . . and so on. The weight of the ' create ' keyword on topic 0 is 0.175. The weights reflect how important a keyword is to the topic. Looking at these keywords, we can predict what this topic will be. We may summarize topic 0 as either ''Relationship among APP entities'' or ''Relation between classes''. iv. We intended to build many LDA models with a different number of topics (K) and to pick one with the optimal number of topics with the highest coherence value, just like the work of previous studies [107], [108], [109], [110], [111], [118]. Choosing 'K' that marked the end of the rapid growth of topic coherence usually offering meaningful and interpretable topics. Picking an even higher value can sometimes provide more granular sub-topics. The same keywords being repeated in multiple topics. It's probably a sign that the 'K' was too large. The MALLET LDA was run on our dataset (FDS) for various values of K from 2 to 40 with increment of 3 by following the technique of Bagherzadeh et al. [108]. We executed MALLET LDA model for a number of times and recorded the coherence score for each model. We used compute_coherence_values() and plt.plot() functions 9 and trained multiple LDA models (Notebook given in dataset link). Figure 3 provides the models' details with their corresponding coherence scores and number of Topics. v. We selected the LDA Model, which had 38 topics with a higher coherence score of 0.56. The lda_model.print_topics() function 10 was used to store all 38 generated topics into LDA_opitmal.csv file (file is available in online dataset). We used these 38 identified topics to find the problems of client developers due to the evolution of WSs. vi. Extraction of the WSs Topics characteristics: By using the FDS dataset and hyper-parameters, we extracted 38 topics related to WSs. The selected model gave multiple pieces of information for every topic. (1) The list of N keywords describing the topic and the weightage of each keyword indicates the keyword's contribution to the topic. We collected 25 keywords per topic. (2) The number of posts in the dataset associated with each topic. (3) The correlation of each post with the topic (in between 0, 1) [66]. We sorted the posts for every topic in descending order to collect more relevant posts against every topic. The extracted topics were used to answer our research questions in the next section.

IV. EMPIRICAL STUDY
The followings are the RQs that we answer in this section:

2) APPROACH
We first labeled topics that describe the underlying concepts to comprehend issues or problems. We adopted an open card sorting strategy of Hudson et al. [119] to manually assign labels to every topic, which was used in many earlier works on topic modeling [18], [107], [109], [110], [117]. In the open card sorting technique, the labels for topics were not predefined; furthermore, such labels were identified during the open coding process. We used three types of information from the extracted data to label topics, (1) keywords in the topic, 2) weightage of keywords in the topic, and (3) a list of top 25 highly related posts to the topic.
Ten people were involved in the labeling process of topics. The details of these people are as follows, 3 were the authors of this paper (2 Ph.D. (SE) & 1 Ph.D. (Scholar), 2 Web Developers (MSCS), and 5 Ph.D. students of SE at COMSATS University Lahore. The labeling process of the topics and their relevant information was explained to all annotators by first author (described in detail in Section III). 38 WSs topics were forwarded to participants in an excel file through email. Each topic was assigned a unique id. The labeling process of topics was performed in different rounds.
Round 1. In this round, the topics were labeled by all participants using keywords, weightage of keywords, and top 25 highly related posts to the topic. The topic file was sent to all participants by author 1. All the participants labeled the topics within one week and replied to author 1.
Round 2. The labels assigned to each topic by all participants were compiled into a single excel file by author 1. All the authors were involved in discussing each label assigned to a topic by all participants. During this process, the team of authors allocated a final label to every topic through discussion and considering all labels assigned to a topic. The discussion among all members continued about each topic labeling until they reached a consensus. To achieve an agreement, the team had to go through more than 25 iterations, during which they discussed in person, over email, Skype, and phone (topic labeled file is given in the online dataset link).
After finalizing the labeling of topics, it was observed that some topics need to be merged because they were equivalent with different vocabularies, and the LDA model determined them to be different. We merged topic # 34 (Web services Architecture) into topic#17 (Web APIs Architecture) because both topics were about the Architecture of WSs. Similarly, topic # 37 merged into topic# 28 because both topics were about the components of WSs and the LDA algorithm placed them into separate topics due to the large range of components of WSs. After this merging process, we left with 36 different labeled topics regarding WSs.
We reviewed the topics again to classify them into higher groups. We involved two web engineers (domain experts) to group these topics according to their relationship. During this process, the team members grouped relevant topics by discussing and considering all topics. The discussion among all members continued for each topic's grouping until they reached a consensus. To achieve an agreement, the team had to go through more than ten meetings in which they discussed in person, over email, Skype, and phone. (topic grouped file is given in the online dataset link. For example, the two topics, ''Relationship among application entities'' and ''Microservice vs Monolithic architecture,'' were related to the development of software applications. Thus, these two topics were put into the ''Client Application Architecture'' category. This process was repeated several times to develop the topics at a higher level of abstraction. Hence, Two topics relevant to ''Client Application Architecture'' will be further attached under ''Clients APIs development'', such that the Architecture is the fundamental part of all WSs-based applications. Other topics might be included under ''Clients APIs development,'' such as, ''Web Application Management'' and ''Good and bad practices''. Similarly, we grouped the relevant topics of the different APIs (development and deployment) under ''Web APIs''. We shared the coding file in our replication package to guarantee that the classification was repeatable.

3) RESULTS
We discovered 36 WSs topics from our FDS. The topics are grouped into six categories after labeling such as Client APIs development. Web APIs, WSs Authorization, Data Processing, Framework Support, Mobile Applications development. Figure 4 shows the topics' distribution of posts in the six categories. Among these categories, Client APIs development has the highest number of questions and topics as compared to others. Figure 5 shows the 36 WSs topics with the number of Posts. Every topic was noticed in an average of 250 posts. The selected topics were arranged in decreasing order. Out of the 36 topics, most of the discussions associated with Web APIs were found in the 370 questions, followed by micro-services and parameters of WSs (350) regarding the development, deployment, and usage of WSs in the client application developments. Figure 6 shows the 36 WSs topics classified into six groups based on the relationship among developers' discussions. For example, the topmost group is client APIs development, which is found in 28 There are detailed discussions about the development, model, and system-based issues of client applications due to the evolution in WSs. This topic has the following four sub-groups: Client Application Architecture includes a discussion about the relationship between entities and types of architecture of APIs client applications. This sub-group has two topics: 1. Relationship in App Entities (336 Q), which contains discussions related to the types of issues that arise among various components of client applications due to use of WSs, and 2. 3. WSs Authorization This group contains two subgroups with five topics: 1. Client-server communication issue subgroup highlights different transmission and communication issues while using WSs in client applications, 2. The user authentication sub-group deals with developers' different accessibility and authentication anxieties regarding VOLUME 11, 2023 WSs-based applications in two topics with 539 questions on SO. These posts covered principles, procedures, and problems relevant to authentication and authorization during intercommunication between WSs and client applications.

Framework Support (769 Q)
We found four framework-related topics that cover 69 posts in WSs related dataset. These topics are bundled around two sub-groups, (1) Framework Compatibility topics related to the Compatibility of frameworks and WSs, and (2) Web Component Development and configuration of web APIs in client applications. Framework Compatibility sub-category consists of two topics, 1. Issues of Web Application Framework (180 Q) while using various WSs, 2. Customer Identification using the Asp.Net framework is discussed in 217 questions of client APIs developers. The Web Component Development sub-group has two topics: 1. Web Development & Java Language (213 Q) discussions about the relationship among WSs and language compatibility, e.g. Java language. This topic is discussed in 213 posts. 2. Form Management for applications (159 Q) discusses the connection between web APIs and forms management in various client applications.

Web APIs (1805 Q)
The category contains seven topics under three subgroups: The topic modeling is associated with the categorization of WSs topics by using the unique features of topics. For example, the Client APIs Development category pertains to using WSs to meet users' requirements for developing mobile and web applications. These requirements evolve with time, and the topics related to each category also change. We analyze this evolution to support the WSs community as it evolves and expands and to find out gaps that still require the attention of the researchers.

2) APPROACH
There are many studies published about web APIs such as web APIs usage patterns [47], temporal properties [45], documentation [46], incompatibility, deterioration, version history, and technology changes [44] by using different discussion forums' datasets (Stack Overflow, Stack Exchange, GitHub, etc.). We consider six key categories of WSs topics in this RQ reported on the SO. We investigated the relative and absolute impacts of each topic listed under all categories described below.
i. Absolute Impact of Topic: We used topic popularity metrics described in a prior study by Han et al. [120] and calculated the popularity of a topic Tk from dataset dj for a post Pi, where i can be any topic within dataset dj. The popularity of each topic is explicitly described as follows: We applied the LDA model for dataset dj to acquire a set of K topics (T1, . . . . . . . . . , Tk). We symbolize the probability for every topic Tk in a post Pi as (Pi, Tk) to express the absolute effect metric of a specific topic Tk in specific month m as: The Q(m) is the number of questions in each month m. Absolute impact for each category C for a certain month m is calculated as: Absolute Impact(C, m) = C Tk Absolte Impact(Tk, m) Category C belongs to Six major categories of WSs topics such as,''Client APIs development'', ''Web API'', etc..
ii. Relative Impact of Topic: As per the earlier work of Han et al. [120], we calculate the relative impact of WSs topics in a certain period using the relative impact metric. We express the relative impact metric of a topic Tk in a month (m) as:

Pi=1
(Pi, Tk), The Q(m) is the total count of posts that include topic Tk in a month (m). The probability of a specific topic Tk for a post Pi is shown by . The relative impact metric calculates the percentage of posts in a given month m related to a certain topic (Tk). The relative impact metric also applies to the categories WSs topics.
Realtive Impact (C, m) = C tk Relative Impact(Tk, m) (7) The C represents the collection of posts relevant to a specific category of WSs posts.

3) RESULTS
Using the above equations, we calculated the impact of particular WSs topics and categories from 2008 to 2022.
i. Absolute Impact of Topic: We analyze the trends for the six categories of WSs topics from 2008 to 2022 in terms of the absolute impact shown in Figure 7. We notice a trend begins in September 2010 and gradually increases for the Client API development category as compared to the mobile application-related topics. Data processing gets the attention of developers after December 2011. The number of posts for Web API development increased between May 2014 (19) to Feb 2020 (25). The significant growth in absolute topic impact for Client API development and Web APIs specifies the evolution of the WSs topics on the SO without any declension point until July 2021. Regarding several categories, framework support and WSs Authorization topics intersect between July 2012 and July 2016 and from March 2018 to December 2021.
We studied the most prevalent topics in the client API development and Web API development categories, particularly where their trends intersect, i.e., May 2014 to Feb 2020. The most popular topic for client API development in May 2014 is Relationship among Application entities topics, e.g., ''an entity belongs to another entity should a RESTful API provide just the id of its parent'' Q396152. Topics related to Web API architectures are also gaining popularity, e.g., ''is it good practice to consume both the REST and SOAP API for a particular service'' Q326347. We also noticed similar topics in the WSs related to the authorization for Web service Accessibility, e.g., ''how to structure path colliding rest web services with role access'' Q314261.
Similarly, in May 2014, popular topics were frequently relevant to web APIs such as REST and SOAP used in developing client applications, as in Q331716 and Q339858. Interestingly, the absolute impact of the framework support category slightly increased in July 2013, specifically due to compatibility problems of APIs with various types of frameworks used in the development of web and mobile applications. Similarly, the absolute impact of the data processing slightly increased in April 2012, mainly due to issues with database and file management systems used in the development of APIs based applications, e.g., ''caching model objects to avoid multiple SQL commits'' Q420784, and ''update XML file versus overwriting XML file'' in Q298920. The Framework support issues using the APIs category do not increase in popularity due to less dependence of APIs on the framework used to implement APIs in web and mobile applications. We observe a slight increase in popularity between July 2018 to July 2019. Figure 8 shows the trend in absolute popularity of WSs topics with peaks from mid-2010 to 2021. We also observed similar trends in Client applications and Web APIs development. The most popular topics in the WSs community are related to Web API architectures. Client Applications and Web APIs Development posts are popular for discussing the issues related to database, framework support, and WSs Authorization.
ii. Relative Impact of Topic: Using equation (6), we calculated the relative impact for all topics of the six identified categories. The distribution of topics is depicted in Figure 9, which also illustrates a relative change in the popularity of topics. We detected an overall increase in client API development-related topics from January 2011 to September 2021. WS Authorization, framework support, and mobile App and WSs-related topics also gained developers' attention from 2011with very high relative variation from March 2017 to September 2021. A remarkable intersection trend was noticed from September 2011 to May 2021 within Client APIs development, WSs Authorization, and Web APIs.
From May 2017 to June 2019, the Framework support category focused more on programming language-related problems while using the WSs, e.g., ''how can I write a set of  functions that can be invoked from almost any programming language'' Q157536, and ''writing a language agnostic API'' Q358532.
The most popular topics in the WSs community in the client APIs development category are mainly associated with types of WSs used in the development of Web and Mobile applications. There are also continuous intersections observed in the client APIs development and mobile application categories in 2013, 2015, and 2019. Client APIs development and mobile applications categories mostly discuss RESTful APIs usage and APIs call issues for various versions of client applications. We also observe a high increase in 2021in Client APIs development Posts. Figure 9 shows that the number of posts increased over time in all client APIs development category topics. An increase in posts of WSs authorization and Framework support categories was also noticed from May 2017 to 2018, and 2021 related to the selection, usage of WSs (Q433985), Client-server architecture (Q310758) and Issues of Web application framework (Q364504).

C. RQ3. WHAT TYPES OF QUESTIONS ARE DISCUSSED BY THE CLIENT DEVELOPERS ABOUT WEB SERVICES? 1) MOTIVATION
After finding the most intriguing topics, we explored the types of questions that client applications developers asked in each WS topic category. A previous study [54] showed that the developers addressed various issues on SO using questions such as why, how, what, etc. Therefore, such a type of investigation will be very useful to determine the type of problems faced by the client application developers in various development tasks of software applications.

2) APPROACH
To comprehend the intentions of client application developers related to WSs in various posts on SO, we extracted a random sample of our SO dataset. We analyzed each question manually and labeled it according to the nature of the discussion in the posts. In addition to the how, why, and what label type of questions used by Abdellatif et al. [116], we also used additional label types for questions such as Information Giving and Information Seeking to make our findings more comprehensive. We extracted the five types of questions in two phases.
Phase I Random Sampling: There were 8,739 unique questions in our final dataset. At least 380 random questions would be required for a statistically significant sample with a 95% confidence level and five confidence interval. We can get a sample representative of the complete dataset by taking a random sample of 1,765 questions. However, a random  sample may skip questions from a subset of whole questions belonging to a particular topic group, even if the subset size is minimal as compared to the full dataset. As stated in RQ1, the 36 topics were identified from the 8,739 posts of SO and organized into six groups: Client APIs development, Data Processing, web services Authorization, Framework Support, Web APIs, and Mobile Applications. It is observed that the question distribution in all groups is different. Therefore, 380 random questions selected randomly might miss various critical questions from categories with less number of questions (e.g., Mobile Application).
Furthermore, we chose a statistically significant random sample from each of the six categories, as recommended by Abdellatif et al. [18]. Figure 10 shows the questions' distribution in the six categories' samples at a 95% confidence level and five intervals. In total, we sampled and manually examined 1,765 questions out of a total of 8,739 questions.

Phase II Labeling of Questions:
We used the prior categorization scheme of Rosen et al. [110] to label each post from our samples. How: The questions about the usage, implementation, architectures, and methodologies of web services [110]. This kind of post differs from the why and what type of posts. In how-type posts, the developers are looking for steps to use APIs in their applications, e.g. (Q35939) ''how do web APIs work?''.
Why: The posts that report the reason, cause, or intention for a given behavior or discussion are called why-type posts. The majority of these posts are related to troubleshooting. Such types of questions help application developers to comprehend and explain any approach, such as ''why does the java collections APIs not have the last method?'' (Q69658).
What: The questions developers used to ask about the specific problems, architecture, and techniques or events are called what-type questions. These questions are about interoperability, performance, code crashes, run-time issues, memory organization, and for certain frameworks or devices. Such types of questions can help developers to make different types of decisions in program development, e.g. ''what are some good photo and artwork APIs '' (Q79186).
Information Seeking: In human and technological contexts, information seeking is the action or activity of striving to gain information. It is different from information retrieval. In these posts, the developers ask for general opinions of developer communities about their problems, such as ''web service sync database main application architecture'' (Q325394).
Information Giving: Effective information sharing enables application developers to make well-informed decisions and completely engage in the development process. It also strengthens the WSs and client developers' relationship and can help to reduce developers' communication gaps, such as ''enriching JWT after OpenID connect flow'' (Q426172).
Others: Posts (1%) that don't fit into one of the above five categories are marked as others, such as ''pending and approval process'' in (Q156185).
As explained above, the first and second authors labeled each post with one of the six question types. The Cohen's Kappa test was used to determine their level of agreement [121]. We had a substantial agreement on the 8,739 identified posts (k = 0.80). This degree of agreement was comparable to the agreements of two earlier studies by Abdellatif et al. [18] and Rosen et al. 110. There were also some disagreements among authors regarding a few questions that were resolved by discussions and multiple reviews until they reached a consensus [110]. We went through three types of iterations to obtain the best label for each topic type. The detailed data of this process is available in the online dataset link.
Some questions had numerous labels, such as How, What, and Info seeking. The intent of the questions' content might differ from the posts' titles. The questions labeled as ''what'' tented to be labeled as Info seeking, such as ''multi-platform password storage with retrieval for applications with authorization''. However, the total percentage of such type of questions were very small as compared to 1,765 categorized questions.

3) RESULTS
The percentages of each type of question for the six high-level WSs groups are shown in Table 5.
What: This type of questions is major in the Web API category (53.5%), e.g., Q143887 and Q274975, in which client developers mostly intend to solve specific issues such as, e.g., Communication among applications Q143887, (2) Architecture design, e.g., Q349114, and (3) benefits of using WSs, e.g., RESTful services in Q349114.
How: This type of questions are major in framework support (36.7%), client application development (33.5%), and Web API (31.6%) categories. For example, Q224740, focuses on the framework used with WSs, and Q423552 focuses on the usage of WSs in applications. The Web APIs category covers various aspects of WSs, such as developing and deploying various types of web APIs, e.g., Q329833 and Q433118.
Why type of questions are mostly asked in the Data Processing category (5.5%). Why-type of posts mostly addressed Database Management and File Handling problems in WSs-based applications, e.g., Q310514 and Q34400. Web APIs (5.0%), Mobile application development (4.8%), and other categories use very fewer numbers of why-type questions.
Info. Seeking questions are major in the Client API development category (31.4 %). These questions are mostly asked in Core App Model Designing topic related to the issues in development of systems, e.g., Q141215 & Q214447. In Data Processing category, 29.3% of questions are of type Info. Seeking. In this category, most of Info. Seeking questions are related to Cache management database, e.g., Q250104, and Web database issues, e.g., Q332260. 26.5% of Mobile App Development category posts seek information about language compatibility, e.g., Q391521, and complication concerns, e.g., Q399342 of developers while using WSs. There are also some posts about gathering different types of information in the category of Framework support (17.2%), e.g., Form management for web Applications Q213639 and web APIs (15.1%), e.g., Project structures and operations Q149724.
Info. Giving: These type of question are major in the categories of Data Processing (10.9%), Client API development (6.3 %), and Web API (4.4%). This category of questions conveyed various types of information about multiple aspects of applications. The Q327804 explained process used by the Restful APIs to handle partial and nested objects. The Q380192 gave information about Micro-services use case for shared database in WSs based applications. However, the Q134220 explained the sending a collection of data to an API in multiple small calls vs one big call. The categories Mobile App Development (2.8%), WS Authorization (0.7%), and Framework support (1.6%) have very fewer numbers of information-giving questions.
Others type of posts are most prevalent in the Data Processing category with 2.6% of all questions, e.g., Q358532 ''corporate website design using WSs''. The rest of all other WSs categories have very fewer numbers of questions that are marked as other. Figure 11 demonstrates the percentage of all types of questions for overall WSs Posts. We observe a major part of the posts (40%) of 'what' type, followed by 'how' (27%), 'info. Seeking (21%), ''Why'' (4%), and info. Giving (4%). These results demonstrate that developers are more interested in asking questions regarding APIs, usage, faults, and frameworks. The developers' discussion is mostly on 'what' and 'how' questions (67%), showing that there is a need for more sources of guidance for developers to design and develop API-based client applications. The developers asked mostly questions of type what (40%) that advocate they provided general information about the supported features of the WSs VOLUME 11, 2023 frameworks for the client developer community. The info. Seeking (21%) labeled posts (Figure 11) also shows that the developers are also focusing on demanding different types of information about usage patterns of WSs in the development of applications.

D. RQ4. HOW ARE THE DIFFICULTY AND POPULARITY OF WEB SERVICES PROBLEMS CHANGING ON SO?
The results of RQ1 demonstrate that different sorts of WSs-related issues are addressed on SO. RQ3 revealed that many posts of How, What, and Information seeking types are related to the client applications development. However, client application developers using WSs face various problems with specific architectures, procedures, framework support, and comprehension of WSs. Due to these problems, certain topics were repeated throughout the posts. As a result, not every topic was equally popular or challenging. Analyzing the topics' popularity and difficulty provided information on prioritizing research and development efforts. For example, newcomers to the WSs could emphasize more popular issues. Researchers of WSs may be able to come up with methods to develop architectures, methodologies, and tools that will be more usable in the development of client applications using the WSs.

1) APPROACH
To determine the popularity of every topic, we used three metrics: 1) Average-mean of the views for all of the posts on each topic, 2) Average-number of questions asked about a topic that has been designated as a user favorite, and 3) Average-score of the all questions in a topic. These three features are standard measures for posts on the SO. The SO team mostly used these measures to evaluate the popularity of posts. To determine the challenges in obtaining solutions for each question relevant to WSs we used two metrics: The percentage of questions for which there were no acceptable responses and the averagetopics median time in hours it tookonor a topic's questions to receive an accepted answer. The question's asker can personally give feedback by marking the answer as accepted. The approved response to a question was considered accurate or high quality. As a result, the lack of an acceptable response might indicate that the questioner could not find any suitable solutions to his problem. However, it may be difficult to get an answer due to the poor quality of a question. The SO community is working together to edit the questions to improve the quality of the questions or answers. Thus, the deficiency of acknowledged answers can likely indicate that it might be difficult for other developers to suggest an answer.
Developers' ability to deliver timely and accurate responses was critical for a crowd-sourced community like SO. The average time it took to obtain a response to a question was only 21 minutes, although a tough question may take longer (Stack Overflow 2022). These five discussed metrics have been used in many past articles to calculate the popularity and difficulty of topics posted on the SO, e.g., Bagherzadeh et al. [108] studied big data topics, Ahmed et al. [109] studied concurrency topics, Abdellatif et al. [18] analyzed software application development topics, etc. When numerous metrics are used to assess quality, it can be confusing if one metric's ranking differs from another metric's ranking. That might happen with our topic popularity or difficulty analysis because each feature has several metrics. Therefore, we build two fused metrics, one to assess topic popularity and the other to evaluate topic complexity. These two measures are described below.
i. Fused Metric for Popularity: We began by calculating the three popularity metrics for every topic in our dataset. The average number of views on a topic might be in the thousands, with average scores ranging from 0-2 and average favorites counting from 0-3. Therefore, we divided the metric value of a specific topic by the average of the metric values over all 36 topics to normalize the metric value. As a result, three new variables were created: one for each of the three normalized metric values. The normalized metric values for topic i was denoted as ViewNi, FavoutiteNi, and ScoreNi.
Score j The average of the following three normalized metric values was used to compute the fused popularity FusedPi of the topic i have given below.

. Fused Metric for Difficulty
For each topic, we first calculated the two difficulty metrics. We normalized a metric value of a specific topic by dividing it by the average of the metric values of all (36) topics. As a result, three new variables were created: one for each of the two normalized metric values. The normalized metric values for topic i was denoted as PerPost-WoAcceptedAnsNi (Percentage of posts without accepted Answers), MedHrToRectAccAnswerNi (Median hours to react to accepted answer).
We normalized a metric value of a specific subject by dividing it by the average of the metric values of all (36) topics.

i: Topic Popularity
In this sub-section, we talk about the popularity of topics. In the coming sub-sections, we also look into the difficulties of topics and the relationship between the popularity and difficulty of topics. Table 6 represents four popularity measures and the number of questions for each of the WSs topics: 1) Avg_View_Count, 2) Avg_Fav_Count, 3) Avg_Score_ Count, 4) Fused_POP (Linear fusion of the first three metrics yields the total popularity of a topic). In Table 6, the topics are organized in descending order according to the Fused_POP values of topics. Advantages and disadvantages of Web Applications topic from the Clients APIs development category have the highest Fuesd_POP value. This topic focused on the pros and cons of using WSs on various aspects of client applications, such as improvement, interoperability, and upgradation of client applications according to evolution in WSs.
The topic DB and data processing in applications from the Data processing category has 2 nd highest Fuse_POP value. This topic contained discussions about communication between various database management systems and WSs-based applications. The APIs call issues topic in web applications was the 3 rd prevalent topic with respect to Fuse_Pop value. This topic had the highest number of views and average favorite count as compared to 1 st and 2 nd highest popular topics. The questions under this topic focus on the calling of APIs from different types of web application sources Q341872 asks about ''rest API design multiple calls vs single call to the API''. Web development problems in the ''Clients APIs development'' topic had the largest number of favorite counts in the SO dataset. These posts were mostly about design, programming language, and hosting web applications. For example, Q130267 asked, ''what programming language to choose for this XML and data processing task''. The ''Micro-service vs Monolithic architecture'' topic from the The client APIs development category was the least popular, with only 4% of all questions and a Fuse_Pop value of 0.14 as compared to 0.39 Fuse_Pop of the most popular topic (i.e., Advantages and disadvantages of Web Applications).
It was also observed that more than 50% of the questions were answered by the developer's community, just like Chaqfeh et al. [122] in 2012.
ii: Topic Difficulty: The three difficulty metrics for each topic are presented in Table 7 (1) Percentage of Posts that have not any accepted answer, (2) The median time in hours taken by the topic to get an accepted answer, (3) Based on the metric 1 & 2, Fused_dif value per topic. The ''Customer identification using asp.net framework'' topic from Framework Support and ''Web services Architecture'' from web APIs categories were recorded as very difficult topics. The ''RESTful API usage'' topic was ranked the third most difficult in the web APIs category. It was observed that questions related to the Web APIs category were the most difficult to answer on SO.
''Web service Architecture'' topic from web APIs categories graded as 2 nd most difficult topic, which is situated in the top half of popularity values in Table 6. This observation shows that WSs developers are interested in using the WSs solutions based on Web service architecture. still, they do not have enough support on SO to get correct answers. The ''RESTful API usage'' topic was marked as the third most difficult topic on SO, with 3.45 average hours to answer these types of questions. The ''DB and data processing in applications'' topic from the Data Processing category was the least difficult in terms of Fused_dif value, according to Table 7. Questions related to this topic were 2 nd highest viewed topic and had 2 nd highest Favorite score as compared to other topics. At the same time, web development problems in Client APIs were the most popular topic regarding average views. The ''APIs call issues'' topic in the same category was the most difficult based on the percentages of questions without accepted answers (58%) as well as the average time to get an accepted answer (1.2 hours). Many questions on this topic contained up to six answers, but none were marked as acceptable.
The ''Web Service Accessibility'' topic in the ''Web services'' category appeared highly difficult, as 51% of posts were without answers. The Posts that received an acceptable answer had to wait for an average of 2.57 hours, which was the longest waiting time in this group. The ''Customer identification using asp.net framework'' topic was the most difficult among the topics in the Framework Support. Questions under this topic were related to MVC, security, and customer identification using frameworks. Similar to the ''Web Service Accessibility'' topic, more than 49% of posts in Customer identification did not have accepted answers. For example, Q115445 reports, ''what are other technologies capable of creating websites with animation load detailed images cross browser o''. It was posted six and half years ago, visited greater than 3257 times, yet still did not gain any accepted response. VOLUME 11, 2023

E. COMPARISON OF POPULARITY AND DIFFICULTY OF TOPICS
If we look at Figure 12, Web APIs are the most popular and difficult topic on the SO according to Fused_pop and fused_dif values. It is also observed from Figure 12 that overall difficulty level of all topics is significantly higher than the popularity of the topics. It is also observed that there is an negative correlation among popularity and difficulty of WSs topic (-0.12). The RESTful APIs are more popular but they have lower difficulty according to Table 6. Other topics, such as web API usage patterns, were less apparent since it was the fourth most difficult but only the ninth most popular. However, at a 95% significance level, these correlation measurements are not statistically significant. Nonetheless, WS educationalists may utilize this knowledge to develop plausible and acceptable answers to challenging problems and push specific topics to increase their popularity.

V. DISCUSSIONS
In Section V-A, we look at the stability of our developed topics. Section V-B explains why we do not include unaccepted responses in our topic modeling process. Finally, in Section V-C, we compare our findings to those earlier studies that implemented topic modeling on SO posts in other domains.

A. STABILITY ANALYSIS OF TOPIC
We applied famous LDA algorithm to discover topics from SO dataset. This multivariate distribution algorithm was employed with various parameters to point out topics, such as numbers of topics, word density, and hyper-parameters. According to Agrawal et al. [123], non-fine-tuned parameters detected different topics in numerous executions of the LDA algorithm on a single dataset. We used the usual topic coherence measurement approach to identify the appropriate number of topics in our dataset, as we mentioned in Section III. We also used available literature guidelines to calculate the values of document density α and word density β. However, we should make sure that the identified WSs topics are consistent across a pair of LDA executions on the selected dataset. To determine the stability of the identified WSs topics in the WSs dataset, we executed the following qualitative and quantitative steps: Step1. We repeated the topic identification process on our SO WSs-related dataset using the same settings as in Section IV. The topics extracted in the new run were considered T2 topics, while those extracted in Section IV were referred to as T1 topics.
Step2. Using a method similar to Section IV-A2, we manually examined and labeled each of the 36 WSs topics for 2 nd round. This labeling was carefully done by reviewing 10 to 25 questions for each topic. Some questions were chosen randomly, while others were chosen by categorizing them according to their correlation to the topic.
Step3. We manually went through topic labeling several times. We combined similar topics, such as we labeled the topic Id 5 in T2 as File formats instead of File handling' because the questions in this topic contained discussions about types of files and different formats of files used in the WSs.
Step4. The final list of topics was compared across T1 and T2. Figure 13 depicts the number of topics extracted from the SO dataset in Round T1 and T2 after manual labeling. We observed that the same number of topics (36) appeared in T1 and T2 rounds. We observed one-to-one relationships for 35 out of the 36 topics. In Figure 13, the 'Joint Match' demonstrates how one topic of T2 ('File formats') was combined with the T1 topic (File handling). Based on a detailed examination of these topics, we believe they might be joined as File handling named in T1. Apart from the joined topics, there was not any topic missing between T1 and T2. In T2, no new topic validated the parameters used in our experiment and extracted the most optimal topics from the selected dataset.
The physical validation of the topics across rounds T1 and T2 assures us that the topics discovered in Section IV were stable. However, the human-based analysis is mostly prone to unintentional subjective bias. Therefore, we considered five methods for automatically matching T1 topics with T2 topics. For each method, we began by selecting a topic Id i in T2 and comparing it to all topics in T1 based on the matching criteria given below, with i and j ranging from 0 to 35. i. Highest Common All P (P_ all) If i and j have the highest numbers of common posts among all the topic VOLUME 11, 2023 If i and j have the highest number of common posts among the top 10 words between them, we assign i from T2 to subject Id_j from T1. The correlation score decides the top ranking between the topic and the words.
If we discover more than one topic Id in T2 with the highest common score for a given matching criterion and a particular topic Idi from T2, we assign i to all of those topic Ids in T1. We compared these assignments with our manual assignments after finishing the topics between the T1 and T2 assignments using the approaches above. We double-checked if our proposed method matched the assignment from our human-based labeling.
The performance of the above five methods is shown in Figure 14 as a percentage of agreement with the manual assignments. With the 'W_ T10' method, we could attain a maximum of 95% agreement between the manual and our method-based assignment. We discovered that our method and the manual assignment were at least 84 percent concordance. This finding adds to the evidence that the WSs topics are consistent. The data of rounds T1 and T2 is shared in our online dataset link.

B. ISSUES WITH NON-ACCEPTED ANSWERS
This research used questions and accepted answers only in our topic modeling process. We did not take into account answers that were not marked as acceptable. Three observations have led to this conclusion. (1) Previous studies that applied topic modeling on SO posts only focused on accepted answers and queries while discussing big-data [108], concurrency [109], mobile applications [110], chat-bot [18] and general technical problems on SO [12]. (2) According to a huge number of SO posts studies, the quality of answers on SO is not always good. It is especially true for answers that are not accepted by the post creators [124], [125], [126], [127], [128]. (3) An response that hasn't been marked as acceptable may or may not be relevant to the question, and there is no easy way to tell. Look at Figure 15 for an example of a query and an unaccepted answer.
In Figure 15, The question is about creating websites with animation. This question has two answers. Both answers are not accepted (shown only one Ans in Figure 15). This answer also receives a score of 2, indicating that the asker or other developers may find it useful. Therefore, including such answers in our post's analysis would have contributed to noise and incorrect insights into WSs discussions on SO. As a result, we opted not to use unaccepted answers in our study. In the threats to validity section, we will explore removing certain potentially crucial insights, such as excluding nonanswers posts.

C. WSs TOPICS COMPARED TO OTHER DOMAINS
As we explained in Section II, SO posts focused on many studies that used topic modeling to study topics across a wide range of domains. Each research analyzed posts about a specific domain. The type and distribution of posts were also different across various domains. These domain-specific features may show the most up-to-date tools and methodologies and the number of developers participating in different research areas. Hence, a systematic comparison of similarities and differences among domains will be helpful and interesting for client application developers. Therefore, we investigated all the different studies that used SO posts to search topics, in various domains. We explicitly looked for six metrics in the study. Four out of the six metrics are relevant to the popularity of identified topics from the selected areas: 1. The Total number of posts analyzed in the study, 2. Average views, 3. Average favorite counts, 4. Average scores. The other two metrics are related to topic difficulty: 1. Percent of questions without an accepted answer, 2. Median hours to accept an answer per topic.
The objective of this comparison is to see whether there are any commonalities or differences between the WSs topics and those in other domains. The metric values reported in the articles are examined. We do not repeat the findings of each publication and do not preprocess any of the data. We only used the relevant articles for comparison if they included the above-mentioned metrics. We found that the following studies reported all of the above metrics among the related papers in the literature in various domains, such as big data [108], chat-bots [18], security [107], mobile apps [110], and concurrency [109]. However, the SO dataset was also used for Blockchain [119] and deep learning [121], but none of these articles reported focused metrics. Table 8 compares the six metrics we have used in our SO research of WSs topics to those reported in the existing research of different domains, such as big data [108], chatbots [18], security [107], and mobile apps [110]. The number of WSs posts is higher than the number of chatbot posts (8.7K vs. 3.8K), but it is lower than the values for the other domains (security, big data, and mobile applications). As previously stated, SO data was also investigated to analyze the programmers' discussion for Blockchain [66] and deep learning [66], [120]. There are 32,375 posts for the Blockchain research and 26,887 posts for the deep learning study, but the other metrics are not addressed in these two studies. These statistics illustrate that WSs are the emerging paradigm among developers' discussions on SO. However, the number of WSs-related discussions on SO might be less than in other domains; this number is increasing across all six WSs categories, as reported in RQ2.
Regarding the other popularity metrics, WSs topics have Avg-Fav-count and the Number of Answers similar to security topics. They have higher Avg-Fav-count and Avg-Scorecount as compared to the chatbot domain. The popularity metric values for WSs topics are, on average, closer to those of security and mobile domains and higher than the other three domains. This discovery may be explained by the fact that Big Data, chatbots, and Web Services are relatively new fields as compared to Mobile Apps and Security. As a result, these domains generate more discussion than the other three domains.
Both difficulty metrics and the WSs topics show similar security and Concurrency domain values. However, from the latest three areas, WSs have the lowest average time for accepted responses (WSs = 2.3, Big data = 3.3, Chatbot = 14.8), which shows WSs are relatively more active and involved as compared to the other two domains.
The average response time difference between WSs and Chatbots is 12.5 hours. This difference is important because Big Data has 2 nd longest average time of 3.3 hours as compared to the other five domains. This difference between WSs and Chatbots may be due to the types of questions shown in Figure 16.
In WSs-related datasets, only 30.3% of all questions are the kind of How. Although 61.8% of Chatbot posts are also of type How. In addition, the prevalence of What-type questions in WSs is 45.8%, as compared to 11.7% in Chatbots, which shows many questions about WSs on SO are designed to understand the WSs paradigm, but not with Chatbots. We found that WSs developers are more involved in answering and commenting on questions about different WSs architectures, methods, and tools than chatbot developers.

VI. IMPLICATIONS OF FINDINGS
The following three WSs stakeholders can benefit from our findings: 1) Client application developers to prioritize their knowledge about the numbers and types of WSs for the development of software applications, 2) Instructors to guide the mentoring of WSs topics, 3) WS researchers to identify the out most insistent requirements in WSs domain and to keep WSs enthusiasts and common reviewers well aware of the latest tendencies in the WSs deployment.
In this empirical study, we draw conclusions and suggestions according to what we have noticed among developers' discussions on SO. Further validation of results can take place through developer surveys. However, due to the variety of WSs topics, we discovered in our research, designing a good survey and identifying a representative sample of WSs developers is a difficult and challenging task. Furthermore, our research covers many WSs posts (8.5K) and hundreds of WSs developers by focusing on certain WSs topics and categories. These findings may be utilized to develop and run several WSs-based surveys. We discuss our findings by referring to specific WSs questions below and looking at them through the literature and the current WSs ecosystem. VOLUME 11, 2023 Figure 17 shows the popularity and difficulty of WSs topics based on three metrics: (1) percentage of Non Accepted answers (difficulty), (2) Avg-view count (popularity), and (3) Number of questions in a topic. The size of the topic ''Relationship among App entities'' in the ''APIs Client development'' category is the most famous topic because it contained a significant number of posts as compared to all other topics. It also belongs to the first half of highly difficult topics, with 49.5% of its posts were without accepted responses. This observation highlights the challenges of the relationship and intercommunication of different components of WSs-based applications [53]. The WSs are intrinsically susceptible due to the limited compatibility resources and the requirement to link various entities [129]. According to our analysis of the posts on the issue of ''Relationships among applications entities'', the communication protocols in WSs-based applications are challenging for developers to implement in various situations. For example, Q241564: ''what is the evidence that an API has exceeded its orthogonality in the context of types''. This post was posted 11 years ago and had more than 350 views but no accepted answers.

A. CLIENT APPLICATION DEVELOPERS
Many web APIs are available online via websites such as ProgrammableWeb, RapidAPI, etc., and increasing daily to facilitate the client applications developers to develop web and mobile applications 11 Due to the rising capabilities of the new Web APIs, new designs, methodologies, and tools are needed. As a result, client developers should remain well aware of all recent and forthcoming WSs issues. Moreover, studies have shown that getting oriented in a new field is challenging [130]. Figure 17 depicts the exchange between the complexity and popularity of WSs-topics. It is helpful for developers who want to understand the WSs architecture. Such as the topic ''Compilation of source code'' in the ''Mobile Application'' main-category is very famous nevertheless, one of the easiest topics. As a result, a programmer may start out by understanding source code compilation in the Mobile Application.
The WSs mostly required fast memory and energy resources. The developers may concentrate on developing and deploying software by studying the ''Memory Management'' topic of ''Data processing'' category. Furthermore, developers may allow communication across applications by learning from the ''Web service Accessibility'', and ''Request response'' topics in the ''web services Authorization'' category, that s absolutely popular and not difficult. Ultimately, developers may learn about the issues of frameworks used for the implementation of WSs in the development of web and mobile applications on the basis of the topics ''Issues of Web application framework'' and ''Customer identification using asp.net framework ''in the ''Framework Support'' category.

B. WEB SERVICES EDUCATORS
One of the most popular topics with average difficulty, as shown in Figure 17 is Web Development Issues. It contains questions about WSs basics, such as the selection of language, API, framework, standard, etc., for client-side and service side of applications. There are different posts on SO addressed framework, standard e.g., Q103517 ''how do you learn a language standard framework API functionality'', and language-related issues e.g., Q180131 ''what are web runtime environments and programming languages'', etc. Many of these Client APIs development posts received greater than 2,000 views, illustrating their remarkable popularity e.g. For example, Q353228 reported: ''how to create APIs from mere programming language ''. Such posts indicated the exigency of training for new WSs and client developers. Figure 7 expresses the dissemination of the six higher-level categories of topics based on the number of questions per year. In contrast, the arrival of new questions is decreasing for ''Client APIs Development'' and ''Data processing'' in 2021. One reason for the decrease would be in the last few years; several web APIs have been indexed on famous online APIs repositories, such as Programmable Web, 12 GitHub, 13 free-for, and dev, 14 etc.,. These repositories are updated daily, which is very helpful for the client developers taking any type of information about WSs.
Our investigation also indicates that most of the questions for ''Client APIs development'' and ''web APIs'' categories from 2016 to 2021 are not answered or without accepted response. For example, question Q289349 ''API Response in APP'', about API usage, was asked by Poster (OP) back four years. It remains without any answer till today. According to Uddin et al. [5], as the WSs standards evolve, the need of guidance and discussions will gradually grow, and even most official documentation becomes outdated. Therefore, WSs educators could produce more information about different APIs aspects to help new client developers, specifically for difficult topics. Figure 17 exhibits that most of the posts in all categories are without accepted answers, indicating a need for more awareness and training material for WSs usage and troubleshooting.

C. WEB SERVICES RESEARCHERS
The high popularity of ''web development issues'' topic in the Client APIs development category may motivate growing research in online analytical processing in the WSs as shown in Table 6. More studies are required to decrease the difficulty of certain topics. According to Figure 17, 50% of the questions related to all topics are without accepted answers, such as APIs call issues, RESTful API usage, Cache management database, Request response, etc. Most of these topics relate to the Web APIs and Data Processing and client APIs development that shows these three categories are extra challenging than others. Figure 8 shows the number of questions across all six WSs-related topics increased with the passage of time. As we mentioned in Section IV-C, posts of types How, What and 14 https://free-for.dev/ Information seeking are dominant among all topics except, i.e., WS_Authorization, it indicates that problems with WSs deployments and Authorization are the main topics of discussion among developers. About the Framework support category, One significant challenge appears to be the improper use of the frameworks and programming languages while developing API client apps. Hence, studies in WSs may ensure that the web APIs are adequately supported for the development of web and mobile applications. Additionally, WSs research may draw inspiration from big data and crowd-sourced research to create tools that dynamically identify answers to non-answered questions, such as by suggesting workable responses of a question [128], [132].

VII. THREATS TO VALIDITY
To address the risks to the validity of our study and its findings, we followed standard guidelines proposed by Wohlin et al. [131] for empirical studies. A threat to external validity refers to the generalization of results. We analyzed the dataset of SO, one of the biggest and most well-known online developers' Questions and Answers discussion forums. However, it is impossible to generalize our research results to other question-and-answer websites. In our approach, we exclusively considered questions and accepted responses. Our methodology is in line with prior studies that used topic modeling on Stack overflow datasets [107], [108], [110], [124], [126], [127], [128], [133]. As mentioned in Section V, it is hard to determine the relevance of an unaccepted response to posts (automatic or manual) if you do not know the option and opinion of the answer. Our SO dataset is quite enough (8.7K posts), even without the unaccepted answers, and it spans posts over 13 years (2008 -2022). Hence, it is expected that a post on a topic that has been eliminated was previously addressed in the 8.9 K analyzed posts. However, we acknowledge the possibility that we may have overlooked relevant unaccepted solutions or topics.

A. INTERNAL VALIDITY
Internal validity is a risk due to experimental bias and analysis-related slipups during the research. In specific, we manually labeled all WSs topics in our research. To minimize the bias in this labeling, two different authors assigned the labels separately, and then another author, an expert in the field, checked the labeling. Any disagreements among the three authors were explored and resolved through discussion to minimize the labeling bias.

B. CONSTRUCT VALIDITY
Threats to construct validity are associated with any errors that might be produced when WSs relevant data is extracted from SO. We collected all SO posts marked with one or more WS-related tags (68 unique tags). As discussed in Section III, a list of tags was compiled using the most recent approaches of Yang et al. [107] and Bagherzadeh et al. [108] and manual validation. We applied the tag-expansion algorithm using five basic WSs tags (web-services, REST, SOAP, API, and Microservices). We selected the two tags (SOAP and REST) by looking at the top 25 tags that also appeared with the ''webservices.'' tag on SO. We have dispensed detailed information on the five basic tag selection processes while discussing the relevant tag selection procedure on SO in Section IV-C. The disparity between theory, observations, and findings also refer to the threat to construct validity. The metrics we used to measure the popularity and difficulty of all topics fall into the construct validity threat. However, we reduced the danger of inaccurate measurements by using metrics exploited in prior studies by Abdellatif et al. [18] and Bagherzadeh et al. [108].

VIII. CONCLUSION
In this study, we used the topic modeling technique to identify the WSs-related topics from the discussion of developers on SO. We observed many findings: 1) Clients developers discussed six major topics on SO related to ''Client API Development'', ''Data Processing'', ''WSs Authorization'', ''Framework support'', ''Web APIs'' and ''Mobile App Development''. 2) the topic ''Advantages and disadvantages of Web Applications'' from the Clients APIs development category is the most popular topic (i.e., the highest number of questions), followed by the topic ''DB and data processing in applications'' topic from the Data Processing. 3) All categories are growing promptly on SO, i.e., posts regarding WSs designs, data processing, development, and deployment are being added at an increasing rate. 4) Although, a significant number of questions is also of type What, the questions of type How and information seeking are highly asked across two categories ''Client API Development'' and ''Data Processing''. WSs developers use the platform (SO) to learn about various WSs-related design, development challenges, and deployment issues and discuss how to solve WSs-related problems. 5) Topics related to WSs based applications, Data processing are the most popular. 6) Topics related to APIs architectures and Parameters of WSs (e.g., client service communication number, type, and size of the parameter) are the most difficult, followed by the Framework Support topic (Issues of WSs-related framework). Our research provides opportunities for the many WSs stakeholders to enhance WSs design, usage, development, and deployment. WS developers can use our results to support and improve documentation, for teachers and developers to structure educational programs, and for researchers to focus on complex topics. To inform WS general users about this rapidly evolving technology landscape, we can develop tools that support the ongoing monitoring and evolution of the WS topic.
As future work, we intend to expand our research to emphasize on specific issues and conduct studies on the most challenging topics such as performing a survey study on WSs developers to better comprehend and acquire more understanding of the 36 topics we identified in our empirical investigation.