Complexity in Data-Driven Fuzzy Inference Systems: Survey, Classification and Perspective

Nowadays, data-driven fuzzy inference systems (FIS) have become popular to solve different vague, imprecise, and uncertain problems in various application domains. However, plenty of authors have identified different challenges and issues of FIS development because of its complexity that also influences FIS quality attributes. Still, there is no common agreement on a systematic view of these complexity issues and their relationship to quality attributes. In this paper, we present a systematic literature review of 1340 scientific papers published between 1991 and 2019 on the topic of FIS complexity issues. The obtained results were systematized and classified according to the complexity issues as computational complexity, complexity of fuzzy rules, complexity of membership functions, data complexity, and knowledge representation complexity. Further, the current research was extended by extracting FIS quality attributes related to the found complexity issues. The key, but not all, FIS quality attributes found are performance, accuracy, efficiency, and interpretability.


Introduction
Nowadays, data-driven fuzzy inference systems (FIS) become popular to solve different vague, imprecise and uncertain problems, like prediction (Lee, 2019), network vulnerability evaluation (Fan et al., 2019), data classification (Ravi and Khare, 2018), (Harandi and Derhami, 2016), image processing (Ananthi et al., 2016), data granularity (Zhu et al., 2018), forecasting (Lou and Dong, 2012), etc. The development of such FIS involves dataset usage for automatic generation of membership functions and fuzzy rules used for inferencing or assessment. In FIS development using data-driven approach a fuzzy model (i.e., MF and fuzzy rules) can be learned quite efficiently and needs less expert input, (i.e., potentially biased human information is minimized) (Nasiri at al., 2011;McKay and Harris, 2016).
Moreover, these complexity issues influence FIS quality attributes (Wohlin, 1996, December), (Nguyen-Duc, 2017). Still, there is no common agreement on a systematic view of these FIS complexity issues and their relationship to FIS quality attributes. Therefore, it is not clear which FIS complexity issues influence quality attributes.
Knowing the proper quality attributes in FIS context would be beneficial for FIS development.
The main research question of this review is -What are the complexity issues in FIS? (RQ1). The further extension of RQ1 is RQ2 as the following: Which FIS quality attributes are influenced by FIS complexity issues?
In order to answer RQ1, a systematic literature review (SLR) is carried out. Consequently, to answer RQ2 and to determine the relationship between FIS quality attributes and FIS complexity issues, the extraction of FIS quality attributes related to the found complexity issues is performed in this paper. This research has two purposes and contributions. It is used to determine the possible set of complexity issues first, and, second, FIS quality attributes related to the found FIS complexity issues. The rest of this paper is structured as follows. Section 2 introduces complexity in FIS and explains the use of this concept in this paper. Section 3 presents the review method. Section 4 shows the obtained results of SLR. Section 5 discusses the paper results, answers the questions RQ1 and RQ2, and concludes the paper. This paper is an extension of work originally presented in conference name (Miliauskaitė and Kalibatiene, 2020a).

Related work
In this research, we have viewed the complexity concept in the context of FIS, but not in general. Therefore, below in this section, we present the complexity concept analysed in the reviewed papers. Consequently, we view FIS quality attributes presented in the analysed related papers in the scope of the found complexity issues. However, since there are little studies on FIS quality attributes, but not software system quality attributes in general.  (Miliauskaitė and Kalibatiene, 2020a).
is more interpretable and less complex (Ishibuchi and Nojima, 2009). In (Askari, 2017), (Kaynak et al., 2002), authors suggest reducing the exponential complexity of FIS by reducing the number of fuzzy (linguistic) terms or the number of fuzzy (linguistic) variables or both. According to (Antonelli et al., 2016), the model interpretability is measured in terms of complexity: "Complexity is affected by the number of features used for generating the model: the lower the number of features, the lower the complexity".
In (McCulloch et al., 2020) authors understand complexity as the increase of fuzzy sets, e.g., the union of multiple MFs instead of a single MF forms fuzzy sets. Authors of (Wang et al., 2020) analyse a sampled-data stabilization issue for T-S fuzzy systems with state delays and nonuniform sampling. In a real-time hardware application, the computational complexity of an algorithm becomes a critical issue (Velusamy and Pugalendhi, 2020). Consequently, as authors of (Velusamy and Pugalendhi, 2020) state, the amount of computational requirement increases with the number of operations. Therefore, authors propose simultaneous evaluation of MFs and the rule set according to decision variables using the water cycle algorithm that results in a compact rule set and identification of an optimal path, and, consequently, consistently reduces the computational overhead. According to (Chiang et al., 2010;Miriyala et al., 2018;Yildiz, 2013), the presence of a large volume data leads to the rise of the number of MFs that need to be computed. This makes complex to implement FIS.

FIS quality attributes
Nowadays, there are numerous works analysing software quality attributes. In this research, we emphasize on FIS quality attributes related to the found complexity issues.
Primary for our study, we have used ISO/IEC/IEEE International Standard (ISO/IEC/IEEE, 2017) as a unified source for the meaning of a software system quality attributes. However, we have not derived and specified definitions of all possible software systems quality attributes not to limit or point in our research's wrong direction.
According to (Febrita et al., 2017), an excellent interpretable FIS meets the criteria as the following: 1) fuzzy set transparency, 2) simplicity of fuzzy rules, and 3) simplicity of the fuzzy model. The fuzzy set transparency ensures that each fuzzy sets are distinct and have distinguishable differences, i.e., the fuzzy sets overlapping is minimized. The simplicity of the fuzzy rules talks about the type of rules used (Febrita et al., 2017), i.e., a number of OR, product, etc. in composing rules. Therefore, to obtain a simpler FIS, the types of rules should be simplified, like in (Gegov et al., 2017) it is proposed several inconsistent rules replace by a single equivalent rule, to optimize parameters of a composite rule (Ma et al., 2017), or rules optimization (Dhebar and Deb, 2020). The fuzzy model's simplicity is determined by the number of inputs and inference rules. The more inference rules are contained in FIS, the more complicated the system will be (Febrita et al., 2017). Consequently, that FIS will lack interpretability. That is, the inference ruleset should be sparse (Tan et al., 2016;Huang and Shen, 2008). Consequently, removing those rules, which can be approximated by their neighbours, allows us to reduce FIS complexity (Tan et al., 2016). The same can be said about the simplicity of the input dataset, i.e., the more complete and compact dataset is used, the simpler the resulting FIS is obtained from this dataset. Unfortunately, modern datasets are huge, sparse or high dimensional (Chaudhuri, 2014;Soua et al., 2013;Lucas et al., 2012;Luo et al., 2019), has wide-range characteristic (Chen et al., 2020), etc.
In the analysed papers, some authors mention FIS attributes affected by its complexity. In (Marimuthu et al., 2019), the optimal number of fuzzy rules allows to achieve an accuracy rate of 95,8 % and to reduce the computational complexity by triggering less number of rules. As stated in (Altilio et al., 2018), the optimal number of fuzzy rules and MFs, estimated using a Regularized Least Squares algorithm and following a procedure based on sparse Bayesian learning theory, allows to achieve better FIS effectiveness. Authors of (Ravi and Khare, 2018) optimize MFs and fuzzy rules using the exponential brainstorm optimization algorithm that allows utilizing them effectively for data classification. The proposed approach obtained the accuracy of 88,8 %, which is higher in comparison with the existing adaptive genetic FIS. In (Golestaneh et al., 2018), authors aim is to significantly reduce the neural network complexity by reducing the number of linear learning parameters, and to decrease the sensitivity while the acceptable accuracy and generalization performances are preserved. Askari (2017) proposed a novel Multiple-Input and Multiple-Output Clustering based FIS that satisfies the interpretability criteria.
Summing up, based on the related work, FIS complexity occurs in each component of FIS (see Fig. 1). However, there is no research analysing FIS complexity in a systematic way. Moreover, there is no common agreement on a systematic view of FIS complexity issues and their relationship to FIS quality attributes.

Review method
The review method was developed and executed according to the guidelines and hints provided by (Kitchenham and Charters, 2007;Kitchenham et al, 2009). The structure of the method is adapted from (Dybå and Dingsøyr, 2008) and its general schema is presented in Fig. 2.

Defining research questions and scope
Questions Formulation: the main question focus (RQ1) is related to the complexity issues of FIS. The secondary research focus is concerned with the FIS quality attributes related to the found FIS complexity issues.
Effect: Description of different FIS development complexity issues; visualisation of statistics by diagrams, view integration.
Studies Language: English. Studies Type Definition: Journal publications (articles -A) and proceeding papers (PP).
Searching sources evaluation: As the study is focused on complexity issues of FIS, relevant papers should be searched in databases covering Computer Science (CS), Information Systems (IS), and Software Engineering (SE).
The Web of Science (WoS) database was chosen for the analysis, since it covers a wider range of refined and not duplicating researches. Moreover, the initial study of sources shows that it contains a significant number of papers relevant to the research questions. It enables us to find the most suitable and complete high-quality refereed studies for our research. WoS indexes high-quality peer-reviewed papers from the most relevant digital libraries for computer science, including journal and conference papers from IEEE Xplore, Springer Link, Science Direct, and ACM. WoS and Scopus databases are not overlapping only 12,2 % of documents in Engineering and Computer Science (Martín-Martín et al., 2018). WoS has an Impact Factor (IF), which is calculated to assess the quality of publications and the level of scientific research in close fields of knowledge. Moreover, WoS presents an easy mechanism to export the search results in different formats, supported by various reference management software, like Mendeley 1 , EndNote 2 , etc., and bibliometric tools, like VOSviewer 3 , CiteSpace 4 , etc.
The main keywords, their synonyms and the search string: the main keywords and their synonyms are presented in Web of Science (fuzzy) AND ("membership function*") AND ("develop*" OR "generat*" OR "construct*") AND ("issue*" OR "limit*" OR "complex*") Performing the search at the selected source Extracting data from the secondary set of papers The primary set of complexity issues (RQ1) The set of FIS quality attributes related to the found FIS complexity issues (RQ2)

Studies inclusion and exclusion criteria
Here we present main criteria used for the studies inclusion or exclusion from the review.

Studies Inclusion Criteria (IC): IC1:
Universally accepted relevant fundamental works on MF development, MF generation, MF construction, FIS and issues, limitations or complexity.
IC2: Papers must be available to download. Studies Exclusion Criteria (EC): EC1: Exclude papers, which contain relevant keywords, but MF and FIS issues, limitations or complexity are not the main topic of the paper.
EC2: Exclude relevant sources that repeat ideas described in earlier works. EC3: Exclude papers, whose length is less than 8 pages, since such short papers can present only a general idea, but not describe overall approach.
EC4: If there are several papers of the same authors with the similar abstract, i.e., one paper is an extension of another, the less extended (i.e., containing less pages) paper is excluded.
The main statistics of the search is presented in Table 2 and Fig. 3.

Years A PP All
The primary set of papers 1991-2019 864 476 1 340 The secondary set of papers 1993-2019 79 23 102 In Fig. 3, the trend of the research on the topic is illustrated. The number of papers on FIS application to solve different complex domain problems raised yearly. This increase of papers can be attributed to technological development and the need to solve uncertain and vague problems in different application domains. However, the issues related to the usage of FIS are analysed insufficiently (Table 2 and Fig. 3).

Fig. 3.
Number of all papers before and after applying IC and EC.

Threats to validity
This section discusses the potential threats to validity of this SLR together with the actions we have taken to mitigate them. Although we carefully followed the SLR process (described in this section and presented in Fig. 2) to reduce the threats to the validity of the results and conclusions drawn in this paper, we faced some threats at their different stages that need further discussion. Construct validity: When defining the SLR scope and keywords, we faced uncertainty about whether researchers refer to the FIS complexity issues or usage of FIS to solve particular tasks in a problem domain. Therefore, inclusion of the general keywords (like fuzzy inference system*, FIS, fuzzy system*, "complex*, issue) into the search string to cover all of the related papers generated an initial pool of 4 357 papers. It mitigated the risk that the study setting does not reflect the construct under study, at the cost of adding additional manual efforts mainly when applying the inclusion and exclusion criteria. Consequently, a primary analysis of papers was done to familiarize with the FIS complexity issues and to define the related keywords more precisely. Some of the main related works are presented in Section 2.
When defining the searching strategy for paper selection, we faced two threats regarding the study's completeness, i.e., whether both (1) the searching sources and (2) the search string enabled all relevant papers to be retrieved. As mentioned previously, we used WoS since it enables us to find the most suitable, complete, and not duplicating high-quality refereed papers for our research. Second, for dealing with validity threats regarding the search string (i.e., missing keywords leading to the exclusion of relevant papers), we carried out the primary study during preparing (Miliauskaitė and Kalibatiene, 2020a). The final search string was the conjunction of the keywords presented in Table 1 in the scope of CS.
Finally, we are aware that our study has a limitation related to coverage. The number of candidate papers might have been affected because (1) the search string might not be complete and might require additional or alternative terms, and (2) only one search strategy was used to select the candidate papers. These issues can be improved using different keywords thesauri, other search strategies, like snowballing or bibliometric analysis, or using more soft criteria for the papers inclusion and exclusion. However, considering the significant number of the primary set of papers (1 340), we consider that our results and findings are valuable for providing researchers and practitioners with an overview of the state of the art of FIS complexity issues.
Internal validity: Individual researcher's bias in (1) deciding whether to include or exclude a paper into the secondary set, (2) classifying it according to the complexity issues and FIS quality attributes, and (3) analysing the results make an internal threat to validity in this research that could lead to biased or erroneous conclusions. We took two main actions to minimize this threat. First, we have used a clearly defined searching strategy (see Fig. 2) to ensure a similar understanding. Second, both authors of this paper have assessed the obtained results (primary and secondary sets of papers) independently and combined the results.
External validity: A lack of consensus when researchers refer to the domain addressed in this study (e.g., the FIS complexity issues or usage of FIS to solve the particular issues in a problem domain) might lead to an inaccurate generalization in our findings. The results and conclusions of this SLR are only valid for the FIS, which understanding is described in Section 2. We have made great efforts to systematically set up the SLR protocol and apply it to ensure those general conclusions are valid irrespective of the lack of consensus highlighted.

Complexity issues in FIS (RQ1)
The main results of our SLR according to RQ1 are presented in Table 3. It consists of six columns, five of which present the complexity issues found in the abstracts of the secondary set of papers. They are the following: (1) computational complexity (CC) (i.e., a huge number of calculations in all FIS components; algorithm complexity); (2) complexity of fuzzy rules (CFR) (i.e., extraction, modification and optimization of fuzzy rules); (3) complexity of MF (CMF) (i.e., MF development, optimization, simplification; partitioning; FOU definition; fuzzy numbers); (4) data complexity (DC) (i.e., a large number of input variables, incomplete data); and (5) knowledge representation complexity (CKR) (i.e., development of MF and RB issues). The secondary set of papers is presented in Annex 1 5 .
As can be seen from Table 3, complexity issues by their frequency of occurrence in the analyzed papers are distributed in descending order as the following. The most frequently found is the complexity of fuzzy rules (58 of 102 papers). The computational complexity (30 of 102 papers) and the complexity of MF (25 of 102 papers) were found in less than a third of the analyzed papers. The least occurred the knowledge representation complexity (13 of 102 papers) and the data complexity (7 of 102 papers).
Temporal distribution of five complexity issues found in the analysed papers included in the review are given in Fig. 4. The size of bubbles indicates the number of papers analysing each complexity issue per year. The larger the bubble, the more papers addressing the particular complexity issue.
As we can see from Fig. 4, the most relevant and constantly found complexity issue in FIS is complexity of fuzzy rules (CFR) (2) (58 of 102 papers). The second relevant complexity issue is computational complexity of FIS (CC) (1) (30 of 102 papers). The third relevant complexity issue is complexity of MF (CMF) (3) (25 of 102 papers). Table 3. Complexity issues in FIS found in the abstracts of the secondary set of papers (1mentioned, 0not mentioned).

Fig. 4.
Found complexity issues according to years.

FIS quality attributes related to complexity issues (RQ2)
The main results of our research according to RQ2 are presented in Table 4. It consists of one column presenting References of the analysed papers and seventeen columns presenting the found FIS quality attributes, which are the following: 1) Accuracythe ability to approximate the outcome of the system accurately (Liu et al., 2007). Accuracy refers to the capability of the fuzzy model to represent the system faithfully (Casillas et al., 2003). 2) Interpretabilitythe ability to describe the behaviour of the system in an interpretable way (Liu et al., 2007). Interpretability refers to the capability of the fuzzy model to express the behaviour of the system in an understandable way (Casillas et al., 2003). 3) Performancein general definition (Cortellessa et al., 2011), it measures how effective is a software system with respect to time constraints and allocation of resources. In the analysed papers, a part of authors explicitly mentioned realtime constraint, like real-time prediction (Lee, 2019), real-time control (Chen et al., 2016), online rule learning from real-time data streams ((Bouchachia and Vanaret, 2014), etc. 4) Robustnessas described in (Fernandez et al., 2005), robustness is the ability of a computer system to cope with errors during execution and erroneous input.
In (Fateh, 2010), robustness is verified by performance of tracking error. Authors of (Nie and Tan, 2008) evaluated performance and robustness of their proposed fuzzy logic controller by changing coefficient values. 5) Flexibilityas presented in (IEEE Standards Coordinating Committee, 1990), flexibility (syn.: adaptability) is the ability of a system to be modified for use in applications or environments other than those for which it was specifically designed. In (Feng and Wong, 2008), authors understand flexibility in terms of turning MFs parameters and fuzzy rules according to the searching space. Authors of (Modi et al., 2007) understand flexibility as ability of a fuzzy system to form any number of clusters. 6) Efficiencythe degree to which a system performs its designated functions with minimum consumption of resources (Cortellessa et al., 2011). According to authors of (Rajeswari and Deisy, 2019), efficiency refers to cost-effective training. In the analysed papers, a number of authors mentioned effectiveness of their proposed FIS (see Fig. 5 and Table 4); however, they do not express directly what it means. 7) Stabilityauthors of (Zhu et al., 2013) investigated the problem of stabilization for nonuniform sampling FIS by combining characteristics of sampled-data systems with a Lyapunov-Krasovskii function that gives a less complex and less conservative stabilization criterion. Fateh (2010) analysed stability for fuzzy control of robot manipulators without knowing the explicit dynamics of a system. 8) User-friendlinessrefers to ease of use as a primary objective (Cortellessa et al., 2011). In (Kóczy and Sugeno, 1996), authors mention that their FIS is userfriendly. 9) Transparencyrefers to the ability of operating in such a way that it is easy for others to see what actions are performed. Authors of (González et al., 2007) and (Kenesei et al., 2007) link transparency with interpretability of FIS, i.e., they state that fuzzy rules should be transparent in order to stay interpretable FIS. 10) Compactnessbased on (Cortellessa et al., 2011), compactness refers to be faster or shorter than the original system. Authors of (GaneshKumar et al., 2014) state that their proposed hybrid Ant Bee Algorithm generated FIS with highly interpretable and compact rules for all the data sets when compared with other approaches. In , authors investigate compactness of rules as well. 11) Adaptabilityaccording to (Cortellessa et al., 2011), it is a synonym of flexibility. However, some authors use different terms for the same attribute. Therefore, it is left separately. 12) Integrationauthors of (Cortellessa et al., 2011) refers to the combining of software components into an overall system. In FIS context, the integration refers to the combination of FIS with other approaches, like neural networks (Lee, 2019), an adaptive principal component analysis approach (Alaei et al., 2013), etc. 13) Self-organizingaccording to (Di et al., 2001), adaptation of MFs and selforganizing of fuzzy rules are realized using self-learning and competitiveness of neural network. In (Hsu and Szu, 2003), authors used unsupervised learning algorithms, like the self-organizing algorithm, to derive MFs and fuzzy rules. Authors of (Rojas et al., 2000) proposed a self-organized fuzzy rule generation procedure. 14) Sensitivityauthors of (Ravi and Khare, 2018) analysed FIS sensitivity according to four datasets that refers to data-sensitive fault (Cortellessa et al., 2011), it is failure in response to some particular pattern of data). In (Golestaneh et al., 2018), authors aim to reduce the network complexity by reducing the number of linear learning parameters, and this reduces the sensitivity of FIS. 15) Reliabilityrefers to the ability of a system to perform its required functions under stated conditions for a specified period of time (Cortellessa et al., 2011).
In (Zhou and Gan, 2008), authors named reliability as the main attribute of FIS as complex systems to model real-world systems. 16) Understandabilityin (Zhou and Gan, 2008), it is related with reliability of complex systems.

17)
Validityrefers to the evaluation of the proposed approaches at the end of their development process to determine whether the system satisfies specified requirements, like (Rajeswari and Deisy, 2019), (Dineva et al., 2017), etc. Table 4. FIS properties related to the found complexity issues in the secondary set of papers (1discussed in the abstract, 0not mentioned in the abstract).  Table 4, in the secondary set of papers not all authors mention FIS quality attributes in their abstracts. Therefore, those papers are excluded from the analysis of RQ2.
The most popular quality attributes found in the analysed papers (see Fig. 5) are the following: (3)   In Fig. 6, the relationship between FIS complexity issues and FIS quality attributes is presented. From the figure, we can see the following.

Discussion and conclusions
Finally, we can summarise the obtained results and answer to the research questions RQ1 (What are the complexity issues in FIS?) and RQ2 (Which FIS quality attributes are influenced by FIS complexity issues?). Based on Table 3, five main issues are extracted from the analysed papers: (1) computational complexity, (2) complexity of fuzzy rules, (3) complexity of MF, (4) data complexity, and (5) knowledge representation complexity. Fig. 4 shows that the computational complexity (1) and complexity of fuzzy rules (2) were found in the analysed papers constantly throughout the analysed years . The complexity of MF (3) issue was found together with the computational complexity and complexity of fuzzy rules, and its analysis increase has been observed in the papers since 2010. The relevance of these issues can be explained by the growth of technologies that generate increasing amounts of data. Therefore, the need to develop MFs from large data strings that requires high computational power is raised. The data complexity (4) issue becomes relevant since 2011. Its relevance can be explained by the emergence of big data, unstructured data and data-driven approach, and their usage in FIS. The knowledge representation complexity issue is weakly expressed directly. It is analysed in tandem with other issues, especially with the complexity of fuzzy rules, since it is recognized that rules are suitable to represent knowledge.
The analysis of the relationship among complexity issues in FIS and FIS quality attributes shows that some FIS quality attributes are significantly influenced by complexity issues in FIS, as the following: 1) the computational complexity influences performance, efficiency and accuracy; 2) the complexity of fuzzy rules is related to accuracy, efficiency and interpretability; 3) the complexity of MF is related to accuracy, performance, interpretability, efficiency and validity; 4) the data complexity is related to performance, efficiency and accuracy; 5) the knowledge representation complexity is related to performance, accuracy, efficiency and validity. Other FIS quality attributes are less significant in the analysed papers. This could be an indicator that some quality attributes might be suitable in a certain FIS context only. This may need further investigation.
Summing up, in this paper we have presented the analysis of the complexity issues found in data-driven fuzzy inference systems (FIS). We have statistically described and discussed the found complexity issues in FIS. Moreover, FIS quality attributes related to the found complexity issues were observed in order to determine the relationship among them.

Future works
The proposed set of complexity issues in FIS is a first step toward the deeper understanding of the complexity of FIS, which can be extended applying root cause analysis technique. It constitutes a basis for discussion and for subsequent work in finding origins of complexity issues of FIS. Moreover, in the future research, we plan to develop a framework of complexity issues of FIS and their possible solutions.
According to the obtained results and identified future works, the need for automation of FIS development with the possibility to choose different levels of complexity of FIS arises. Since different domains require different FIS regarding quality attribute values, FIS of varying complexity levels is obtained as a final result. Moreover, developing FIS of particular complexity is a multi-criteria decision-making task since consensus among FIS quality attributes should be obtained.