What Software Architecture Styles are Popular?

. One can meet the software architecture style's notion in the software engineering literature. This notion is considered important in books on software architecture and university sources. However, many software developers are not so optimistic about it. It is not clear, whether this notion is just an academic concept, or is actually used in the software industry. In this paper, we measured industrial software developers' attitudes towards the concept of software architecture style. We also investigated the popularity of eleven concrete architecture styles. We applied two methods. A developers’ survey was applied to estimate developers' overall attitude and define what the community thinks about the automatic recognition of software architecture styles. Automatic crawlers were applied to mine the open-source code from the GitHub platform. These crawlers identified style smells in repositories using the features we proposed for the architecture styles. We found that the notion of software architecture style is not just a concept of academics in universities. Many software developers apply this concept in their work. We formulated features for the eleven concrete software architecture styles and developed crawlers based on these features. The results of repository mining using the features showed which styles are popular among developers of open-source projects from commercial companies and non-commercial communities. Automatic mining results were additionally validated by the Github developers survey.


Introduction
Software architecture [1] is a discipline within software engineering dealing with software systems' structural and behavioral design. Software architects and designers define how the system is organized, its components, how these components communicate, etc. Software engineering literature (see, for example, foundational works by Shaw and Garlan [2], Taylor et al. [3], Richards and Neal [4]) applies a notion of architecture style or pattern. Shaw and Garlan [2] define it as follows: «An architectural style defines: a family of systems in terms of a pattern of structural organization; a vocabulary of components and connectors, with constraints on how they can be combined.» Taylor et al. [3] proposed another definition: «An architectural style is a named collection of architectural design decisions that (1) are applicable in a given development context, (2) constrain architectural design decisions that are specific to a particular system within that context, and (3) elicit beneficial qualities in each resulting system.» These definitions are general and relatively abstract as well as most other definitions from software engineering books and papers. Usually, no clues on how these styles can be identified and implemented in a concrete software source code are given. This work summarizes our team's first results to understand better the concept of software architecture style and make it more tangible. To do so, we first tried to find out the developers' attitude towards the concept of software architectural style. Are real non-academic developers familiar with this concept in general and with different particular styles? Do they consider this concept useful in their everyday professional activities? Secondly, we tried to identify empirical features of software architecture styles, which can be used in practice to recognize the usage of software architecture styles in Java and Python programs. For our research, certain architecture styles were chosen. Then, we chose a small sample of software repositories, which were investigated to get empirical features of the architecture styles in source code. Afterward, the crawlers were written and applied to parse the bigger sample of open source repositories on GitHub 1 service. 9 Besides, we conducted two developers' surveys. The first survey aimed to find out the developers' attitude towards the concept of software architecture styles. We held the survey to understand better whether this topic is worth researching. The second survey aimed to validate the results of the crawlers' parsing. The aim of our research project -to identify empirical features of architecture styles -is new to software engineering, while the applied methods are well known. Surveys and repository mining were applied in many other research projects on code smells detection and design patterns identification (see Section 6). The methods we used had shown themselves as feasible in exploratory research projects. Due to the first survey results, developers have positive attitude towards architecture styles. Many of them apply this concept in their projects, and even more of them think it is beneficial to be acquainted with the concept. In data provided by the automatic crawlers we found, how much each of the chosen architecture styles is used in practice. We validated the results of crawlers mining using the second survey.

Research Questions and Paper Structure
In this paper, we consider the three following research questions. RQ1: What is the community attitude towards the concept of software architecture style? Software architecture is taught in universities. Technical experts and master coaches promote advanced styles. However, what does a typical software engineer think about this concept? We try to answer this question in Section 3 using a developers' survey. RQ2: How can we detect a software architecture style in code? Results of the RQ1 survey encouraged us to try to construct a procedure for detecting software architecture styles in an actual source code. To do so, we first needed to select features related to particular styles using which we can automatically detect them. Section 4 answers the second research question and presents style features and automated scripts which help us to detect styles in code. RQ3: What software architecture styles are popular in open-source projects? Finally, it is of interest to investigate the source code of existing software to decide what styles are popular. Fortunately, much open-source software is available for researchers in the modern world. Thus, we can mine open-source repositories and apply our architecture style detection tool to them. This procedure is presented and discussed in Section 5. Section 6 describes some works related to our research project, while Section 7 concludes this paper and proposes the ideas for further work.

Software Architectural Styles (RQ1)
Our first questions were as follows. Whether the concept of software architecture style is familiar to developers? Is this concept considered applicable? What particular styles are familiar to developers and are worth considering in the following steps of our research?

Architecture Styles Survey
To answer these questions, we provided a developers' survey described in this section. For our research, we have created a survey 2 using Google Forms 3 . This survey consisted of 3 categories of questions. Demographical questions: These are questions about programming experience, job area, preferences in technologies, and a respondent's frameworks.
10 General questions about software architecture styles: Whether participant had or had not heard and used the concept of architecture styles in their professional life? Questions about the set of particular architecture styles we selected for our research: We asked whether the participant knew the name of the style and how he or she thought it is possible to identify that certain style in code. These questions aimed to find out what community of developers thinks and knows about architecture styles usage and architecture styles identification.

What Styles did We Select?
We have selected the following eleven software architecture styles for this research: • Model-View-Controller (MVC) architecture; • Main and sub-programs; • Machine-learning-based software; • Event-driven software architecture; • Reflection-using software; • Data-centric software architecture; • Expert system; • Cloud-service-based software; • Software with containerization; • Aspect-oriented software architecture; • Reactive-based software architecture.
These particular styles were chosen based on software architecture pattern and style catalogs from foundational literature of the field [3][4][5][6][7]. Usually, software architecture books are large and contain profound discussions on each of the styles considered important by book authors. The list of software architecture styles is a massive one. We had to limit this list somehow for it to be treatable within a single research project's borders. To select the particular set of styles, we consulted with literature of the field [3][4][5][6][7] as well as Wikipedia.org information 4 . Some of these styles (for example, MVC and Event-driven architectures) are popular and frequently used among software developers. Others (for example, aspect-based software and expert systems) are not famous in modern software engineering. Besides, we selected styles for which we can define features based on which the style smells can be detected in source code. Thus, we consider it worth investigating this particular set of styles. However, we do not state that this is an exhaustive set.

Survey Data
The survey was held from September till December 2020. As it has been mentioned, Google Forms were used for the survey. The survey form was spread in different developer communities connected with various areas of development: game development, back-end development, front-end development, data science, etc. We hoped to achieve randomness and broader coverage by doing so. In total, 111 developers participated in the survey. Participants of the survey have different experiences in software programming. Fig. 1 shows participant programming experience in years. From this figure we can conclude that about half of all participants were in the middle of the experience range: slightly less than one quarter have experience from 1 to 2 years, a little bit more than one-quarter of the total have experience from 3 to 5 years. Experienced developers make one-third of the total number: about one-fifth have experience from 6 to 10, and slightly more than 15% have experience from 11 to 20 years. At the ends of the distribution, we can observe 5% of developers with experience less than 1 year and about 3% of very experienced developers who are in the field for more than 35 years.   Finally, we asked participants about the programming languages they used in their work. Fig. 3 show how they answered. It can be seen that the survey participants are using different languages in their practice. The top three most popular languages in our survey are Python (45.9%), Java (33.3%), and SQL (32.4%). Partially because of these results, we decided to continue our research based on Java and Python source code. Demographic data showed that our survey participants were similar to the typical software developers. For example, the participants' set of main languages is very similar to the well-known 12 TIOBE Index 5 . Our selection is somehow shifted to object-oriented languages for back-end development. However, of the top 10 languages in TIOBE Mar 2021 (C, Java, Python, C++, C\#, Visual Basic, JavaScript, PHP, Assembly language, SQL) 7 are also presented in the top 10 languages used by survey participants. Developers came from different fields, which are popular in modern software engineering.

Survey Results
Our survey asked whether participants used the concept of software architecture style in their daily work practice. Fig. 4 shows how they answered this question. In this figure, we can see that almost 40\% frequently use the concept of architecture styles in development. 36% of all participants use them from time to time, and one quarter does not use architecture styles at all. The next question brought us surprising results. We decided to find out what participants thought about the developers' community in general. In particular, we asked what participants thought about how their colleagues applied the concept of software architecture style in their work? Fig. 5 shows that only 14.4% think that developers from their community do not use the concept of architecture styles. Interestingly, developers tend to think their colleagues are significantly more familiar with the concept of software architecture style than themselves. 13 Finally, we were interested in what participants think about the feasibility of detecting architecture styles. The developers were asked whether they thought it is possible to identify the usage of a software architecture style in the source code automatically or manually. For each of the styles there were five possible answers as follows: • Yes, by looking at language constructions manually; • Yes, by looking at language constructions automatically; • Yes, by looking at frameworks (you can list frameworks in ``other'' section); • haven't used this style; • Other (open answer).
A participant was able to select several answers simultaneously.  Numbers in fig. 6 show how many developers selected this particular answer category for a particular architecture style. According to our data, developers are not very optimistic regarding software architecture style detection. However, for most architectural styles, at least one-third of all survey developers believe they can be identified either automatically (30% -50%) or manually.
14 Developers tend to think that familiar styles are more likely to be identified. For example, MVC is the most known style among the others. Most developers think it can be identified manually (53.2%) and automatically (51.4%). Expert systems and aspect-oriented software are unfamiliar to more than half of the developers from our selection. Not so many participants believe these styles can be identified by investigating the software source code.

Conclusions
The results of the developers' community survey are interesting events separately. However, we analyze them in the context of a larger project. The survey shows that 3 of 4 typical developers apply the concept of software architecture styles in their practice. Moreover, developers believe their colleagues use this concept even more often. This means that developers consider the concept important and valuable in the software engineering process. From 3.5 to 4 out of 10 typical developers believe that software architecture style can be identified by investigating the software source code. In general, slightly more developers think that a style can be identified manually. Besides, the more familiar a particular style is to the developer, the more likely it would be considered identifiable by this developer.
The crawler was written in Python 3. We used the library, called PyGitHub 7 . Every crawler gets access to the companies' repositories by tokens previously generated by us manually on Github.

15
Our crawlers got access to GitHub repositories by using the token mechanism. Every token allows making ten thousand requests to Github per hour. For the mining process to continue flawlessly, several tokens have been used. The tool iterates through the token list and requests the source code from every repository taken for the research. For every software architecture style, we created a separate specific crawler. Their code is accessible at the project web page.

Features of Software Architecture Styles
We have created features of different origins for eleven styles from our research. These features can be grouped into two main categories. The first group of features contains framework-based features. We firstly identified frameworks that propose implementations of particular architectural styles. After that, we identified usage of the style by finding usage of these frameworks in source code. Such features were used when identifying Model View Controller (4 python frameworks, 4 java frameworks), Machine Learning based style (24 python frameworks, 11 java frameworks), Event-driven (10 python frameworks, 8 java frameworks), Data-centric (25 python frameworks, 22 java frameworks), Expert systems (7 python frameworks, 3 java frameworks), Cloud systems (10 python frameworks, 7 java frameworks), Aspect-based applications (3 python frameworks, 1 java frameworks). The second group of features contains language-based features. This means that we first identified how certain styles are implemented in specific languages (Java, Python). After that, we identified usage of the style by finding particular language constructs. These features were used when identifying Main and Sub-programs, Reflection architecture styles.

Dataset Description
Our web crawlers gathered a dataset that we used to answer RQ3. This dataset consists of JSON files. Each file in the dataset is related to a triple: (programming\_language; company\_name; software\_architecture\_style). In total, the 3057 repositories were processed. 1682 of them are repositories with source code in Java, whereas 1375 contain Python source code. Each repository can contain code in other programming languages as well. The results of mining contain 172 JSON files. These files contain data on features identified for a particular triple. Each file includes a set of pairs (repository : [found_features]), where [found_features] is a list of features of the particular architecture style which were found in the repository.
Here is the example of such a pair: … "EmbeddedSocial-Android-SDK": ["NONE", "getName\_feature", "getClass\_feature"], … Every string includes a constant indicating if the related repository's processing was finished. It was used for repository processing and did not have any special meaning. Some of the lines may contain constant indicating that the mining process was stopped. This happened when a repository weighted too much to be processed by 10 000 requests of the crawler. In these cases, a GitHub API token reaches its' limit. This case was not frequent. In particular, 2162 pairs out of a total 27107 contain these stops. The full dataset is available on the project web page.

Data Analysis and Discussion
Summarized results derived from the dataset are presented in this section. The following tables show these results. Let us consider and discuss the popularity of particular styles. Note that open repositories of Apple company contain no source code in Java.

Model-View-Controller (MVC) architecture
In Table 2 one can see that MVC is used in approximately 25% of Microsoft, IBM, and Apache Java repositories. In Java repositories, the MVC style is mostly represented by the usage of the Spring framework that is very popular, especially in Apache Foundation projects. MVC is used in slightly less than 10% of Microsoft, IBM, Google, AWSlabs Python repositories (see Table 3). In Python repositories, MVC style is mostly represented by Django framework. Thus, web development is responsible for a significant fraction of usage cases in Python community. It is also interesting that Python is relatively more popular in open projects of commercial companies, whereas Apache Foundation is the leader in the development of Java projects.

Turbo Gears
This style is the second most popular} from all styles in our style set. It is a significantly more popular style than others. This conclusion agrees with the survey results shown in fig. 6.

Main and sub-programs
Let us consider fig. 7 and 8. Each of these two figures shows two intersecting disks. The left one shows the number of repositories with «main» function. The right one shows a number of repositories without the usage of constructors. Thus, repositories that satisfy our criteria lie in the intersection.
It is easy to see no more than 1 repository of such type in Java. It is not unexpected because Java is a pure object-oriented language. So, any Java program contains objects or classes.

Event-driven software architecture
According to Tables 4 and 5 Event-driven architecture style is used in approximately 9% of IBM and Apache Java repositories. Curiously, event-driven style is applied in more than 50% of AWS Python repositories and approximately 20% of Apache Python repositories. Such applications are related to distributed and asynchronous software for web applications. Other companies tend to apply event-driven style in less than 1% of their Java and Python repositories. Event-driven architectures are mostly represented by the usage of Kafka framework and Amazon Active MQ framework in both Java and Python repositories.    We can conclude that usage of event-driven architectures hugely variates from company to company and relatively popular in projects of AWS and Apache whose business is mostly web-based and large-scale oriented. Thus, these companies invest in scalable web applications and infrastructure code. Other companies concentrate more on desktop, mobile, and web applications without such need in scaling and asynchronous code.

Machine-learning-based software
The first general finding is that Java is not used commonly to develop machine-learning-based software. According to Table 6 we found smells of machine-learning-based style only in 16 Java repositories. On the other hand (see Table 7), this style is often used in Python repositories by various companies: Microsoft (61%), IBM (52%), Google (38%). ML source code is mostly represented by the usage of Numpy, Pandas, Matplotlib, and libraries for neural networks.  We can conclude that machine-learning applications are very popular in Python ecosystem. Most Python repositories of companies contain smells of ML. Moreover, we can conclude that machinelearning software is the most popular software style (with respect to a total number of repositories with this style) according to our data.  180  146 129  45  24  13  13  8  12  25  Data-centric 47  25  29  10  3  2  22  0  2

Data-centric software architecture
We found smells of data-centric style in 15%-25% of Java repositories (Microsoft, IBM, Google, Apache, see Table 6) and in 8%-20% of Python repositories (Microsoft, IBM, Google, 18F, see Table 7). In Java repositories, data-centric software style is mostly represented by PostgreSQL and MySQL libraries' usage. This style is represented by the usage of the SQLAlchemy library in Python repositories.
We can conclude that this style is the third most popular of all styles thanks to Apache Foundation with more than two hundred such projects. Other companies apply the style as well.

Reflection-using software
This style was detected using several language features that indicate reflection appliances in a source code. Many repositories contain at least one of the features of a reflective code. However, we believe that the code with such an ephemeral smell can be called reflection-using. However, what should be the number of reflective features in code to call it reflection-using software. It is not that easy to define the concrete number. Thus, we decided to show the summarized data in this paper. A better definition of this architecture style will be a subject for future work.
21 Fig. 9. Reflective code features in Java repositories Fig. 9 shows the results for Java repositories, whereas fig. 10 considers Python repositories. In both cases, one can see that about 20% of repositories contain no reflective code smells. So, we can conclude that about 80% of Java and Python repositories have at least one feature of Reflectionusing software.

Fig. 9. Reflective code features in Python repositories
Our conclusion is that most of the open-source software in our dataset contains some reflective code features. Our definition for this style is too vague and has to be refined.

All other styles: cloud-service-based, aspect-oriented, reactive-based software, expert systems, software with containerization
It is clearly seen in tables 6 and 7 that smells of all other software architecture styles are very uncommon in our dataset. Cloud-service-based style tends to appear in AWS Java repositories. This can be explained by the usage of AWS's own library for cloud development. Also, the Cloud-service-based style was found in Google Python repositories. It can be explained by the usage of Google's library for cloud development. These two cases are outliers, and overall we did not find out Cloud-service-based style as a popular one. The same is true for other styles. We did not find almost any usage of these styles with the proposed features. Thus, we can conclude all these software architecture styles are unpopular in open-source repositories of our dataset. This can be due to at least two reasons: either these styles are uncommon in open-source software, or we use flawed features. Both reasons are possible. The future work will be to elaborate on this issue.

Additional Results Validation
To verify the proposed feature model, we decided to ask developers of repositories, which we have processed, about the usage of the 11 architecture styles in their repositories. We extracted developers' emails from each repository we have processed. There were about 10 thousand repositories. Then we used Python code to send every one of them a letter with a link to the particular survey based on Google Forms. This form asked developers to specify what of our 11 software architecture styles they used in their repository. We got 69 replies to our Google Form. Out of these replies we extracted information on 78 repositories that we have previously processed.
One can easily see that we have no significant number of answers here. Thus, the following can not be considered as an extensive validation. However, we believe these results still can be of interest to the reader.
On every architecture style out of 11 we counted 4 metrics: accuracy, precision, recall, and F1-score. We considered developers' answers from the form as correct data and our answers as predictions.
Among the repositories that authors answered our survey, there was no that used the following styles: Main and sub-programs, Expert system, or software with containerization. Table 8 shows the results. According to the table, the best F1-score was reached for Reflection-using software (0.58) and Data-centric software (0.45). Recall overall was less than 30\% with such exceptions as Reflection-using software (0.64), Model-View-Controller (0.39), and Data-centric software (0.39). The highest precision was achieved for Event-driven software (0.75). The results are different for various architectural styles. We can conclude the following.
Features for Main and sub-programs style and Expert systems could not be validated because among the repositories from the validation survey, there was no use of these two styles. Features for software with containerization style are not full. Using our features, we did not find it in any of the repositories in which the style was used according to their developers. Features for Reflection-using software and Data-centric software styles have not been enough for perfect identification, but they showed appropriate F1-score results. Features for Model-View-Controller, Machine-learning-based, Event-driven, Cloud-service-based, Aspect-based, and Reactive-based software styles show bad performance, mostly because of low recall. This means that our features have not fully covered the usage of these styles, and further investigation is needed.

Conclusions
Most of the results obtained by our automatic crawlers agree with the survey results, which are shown in fig. 6. Less-known styles are less common in open-source repositories; well-known styles can be found in many more repositories. An outlier here is software with containerization style. Feature for this style seems ill-designed because many people are accounted for it, whereas we can not detect it in source code. Both an automated analysis and a survey indicate aspect-based software and expert systems as the least popular architecture styles. However, the additional validating survey (with a small number of answers) indicated that our features for some architecture styles show lousy performance. Thus, additional work is needed to improve the style and feature sets.

Related Work
We consider two large fields as related to our research. These fields are software architecture research and software repository mining. Whereas the former field is relatively old in terms of software engineering time scale, the latter is relatively young and fast-growing. We will try to observe both fields in this section. Sharma, Kumar, and Agarwal [8] listed 23 software architecture styles in 6 categories due to the application type. This paper can be considered as a starting point to discuss architecture styles. The authors have chosen some styles (what styles?) out of all mentioned and gave short descriptions to them. However, there cannot be observed any code features of any style which can be used to identify it in a real project. The paper also leaves without attention statistical aspects of architecture style popularity in practice. Automated software architecture recovery is related as well. Researchers in this field aim at constructing models of architecture decisions of existing software using data analysis and other automated techniques [9]. In software repository mining papers on code smell detection are close related to our project. Fontana et al. [10] concentrated on code smells and a machine learning-based approach to code smells detection. The authors collected a dataset of heterogeneous systems and a set of tools for detecting code smells and trained different machine learning algorithms with default parameters. Boussaa et al. [11] introduced code smell detection based on genetic algorithms that are called the competitiveco-evolution-based method. The method's idea is to generate two data samples: a sample of code smells and a sample of solutions. The aim of code smells generation is to escape from search methods, and the solution aims to cover more code smells. These works do not pay attention to software architecture styles, but their general approach seems attractive. A repository mining method has been applied to reveal how software architecture evolves with time [12]. Code mining can help to evaluate software architecture as well [13]. Kouroshfar et al. \ [14] applied automated architecture recovery techniques to show how the erosion of software architecture decisions influences software evolution. There is a massive corpus of literature on software architectural smells and their automated detection. Architectural smells are signs of bad practices in the software design process, similar to code smells. The difference is that architectural smells are related to the level of general design decisions, whereas code smells are related to anti-patterns and bad practices on the level of software code. Fontana et al. [15] investigated how these two types of smell are interrelated. Previously, many automated tools have been developed to detect or predict architectural smells [16][17][18][19][20]. Azadi et al. [21] even proposed a catalog of such smells which different tools can detect. Features of software 24 architecture styles that we consider in this paper are similar to smells. However, our features do not sign bad practices or anti-patterns. Contrariwise, our features indicate the presence of an architectural style. Note that surveys are considered a good research tool in empirical software engineering. For example, Palomba et al. [22] used surveying to understand how developers feel a relationship between code and community smells. In our research, we apply surveys as well.
Recently, repository mining has been used to explore software in an empirical study on what software project artifacts are [23]. Not surprisingly, software projects consist of code but also of documentation, data, and many more different artifacts. Our research is similar in the sense of intentions. We seek for better understanding of the current field of software development.

Conclusions and Further work
In this paper, we measured industrial software developers' attitudes to the concept of software architecture style. We also investigated the popularity of eleven concrete architecture styles. We found that the notion of software architecture style is not just a concept of academics at universities. Programmers apply this concept in their work. Moreover, industrial software developers consider the concept as improving their professional skill-set.