MOSY: A method for synthetic opinions to yield a robust fuzzy expert system

Graphical abstract


a b s t r a c t
In many domains, decision-making is challenging, as experts are often limited in availability. However, without a sufficient number of expert opinions, the associated solutions would not be robust. Motivated by this,  , a M ethod for SY nthetic O pinions has been developed to produce a robust Fuzzy Expert System ( FES ) by specifying , the number of (synthetic) experts per r ule. For every one of these "synthetic experts ",  produces an opinion from a normal distribution characteristic of a human expert. Correspondingly, the FES is used to produce an opinion from an antecedent vector whose elements are sampled from a uniform distribution. Synthetic and human opinion vectors, resulting from all rules and number of experts per rule, are driven to agree through optimization of weights associated with the fuzzy rules. The weight-optimized  was tested against sets of human expert opinions in two distinct domains, namely, an industrial development project ( IDP ) and passenger car performance ( PCP ). Results showed that the synthetic and human expert opinions correlated between 91.4% and 98.0% on an average over 5 ≤ ≤ 250 , across five outcomes of the IDP . Likewise, for PCP , respective correlations varied between 85.6% and 90.8% for 10 ≤ ≤ 150 across the two performance measures. These strong correlations indicate that  is capable of producing synthetic opinions to yield a robust FES where sufficient human experts are not available.
• This method, known as  , generates synthetic expert opinions to achieve robustness in an FES . •  was validated against sets of human expert opinions in two distinct domains.
• Strong correlations were observed between the synthetic and human expert opinions.

Method details
Idea: To introduce the idea underlying the method presented here, consider a simple problem of measuring , the true length of a bar. Trained metrologists (experts) independently measuring would report values that are scattered in the form of a classical normal distribution ( ND ) about . Thus, from a sufficient number of independent expert measurements about , one could obtain a reliable estimate of . Observed another way, the degree of expertise is characterized by the extent of deviation of from the mean or in general, the centre of truth ( in the above case). Drawing from this analog, if one were to consider a FES [1] composed of a fixed set of antecedents, one could "synthetically " produce the associated "expert-generated " consequent ( ) by sampling from an ND adopting as mean and standard deviation that reflects the degree of expertise. Fig. 1 illustrates the above idea comparing expert and non-expert opinions over the true length, . While expert opinions (points in black) are most likely to cluster around , such opined measurements would be far away from when reported by non-experts (points in red). In Fig. 1 , min and max stand for the smallest-and largest lengths measurable by the metrologists.
Earlier, Yang et al. [2] have demonstrated a similar concept of generating virtual samples from a Gaussian distribution to produce training sets for use in neural networks. Although the present idea also relies on such sampling, the originality of our method is the novel implementation into an FES .

MOSY
Working: Building on the idea described above,  , a M ethod for SY nthetic O pinions was developed and described here. As shown in Fig. 2 ,  is composed of three modules: FES , a genetic algorithm ( GA ) [3] for rule weight optimization and SO , a module for synthetic opinion generation. The FES is constructed based on the knowledge of only one human expert.
The SO module, a key constituent of  , produces a scalar sample from the ND characteristic of an expert opinion , as discussed in 'Method details'. On adopting , which is the number of (synthetic) experts for each rule of FES, SO is used to generate a total of = × samples, forming the vector of synthetic opinions  in Fig. 2 . Correspondingly, on the FES side, against each of the samples, an antecedent vector  is generated with elements sampled from a uniform distribution ( UD ). Such UD samples lie within the support of the membership functions associated with  . For each  , a consequent is produced through the respective rules matrix of size × from the standard Mamdani inference process shown in Fig. 3 , leading to  , the vector of consequents from as many antecedent vectors  .
Rationale: Fig. 4 illustrates the rationale of  , according to which  represents the true opinions and with increasing ,  can be driven to increasingly agree with  . More precisely,  character is determined both by character of the underlying  as well as the . At the same time, as Fig. 4 shows,  would possess a multi-cluster character, with each cluster forming about a center of truth associated with each of the rules. Also, as the figure shows,  and  would have a good agreement when opinions from the former closely pattern those from the latter. On the other hand, agreement would be average to poor for moderate to contrasting opinion patterns between the two vectors. Accordingly, to maximize agreement between the  and  , we adopt Euclidean norm of the difference between  and  as the objective , to be minimized by search, across the  space, as given by (1) .
The GA generations continue until falls below a specified tolerance. The above process leads to a robust FES with weights optimized for maximum agreement between human (  ) and synthetic expert opinions (  ). Robustness is attributable to the fact that many (synthetic) expert opinions shape the rule weights.
Related work: Synthetic data generation methods have been considered across distinct domains such as synthetic time series generation [4] , synthetic electronic medical records (of a fairly critical nature) [5] , vehicle image domain randomization [6] , use of Kriging and Radial Basis Functions in Bayesian networks [7] and synthetic keystroke dynamics [8] . In these works, the common motivation was largely the need to compensate for the lack of real-world data.
Decision-making is a key step in all business domains. In turn, domain expertise is the key to decision-making. Many niche domains lack sufficient expertise, either because of an insufficient number of experts or of knowledge itself. While an accurate FES could possibly compensate for this, the challenge lies in achieving accuracy as well as robustness of the resulting opinion(s). In our effort to address this challenge, ideas have been drawn from the authors' own expertise as well as sources such as the excellent treatise by Ericsson et al. [9] . In particular, the present method's idea of sampling about a center of truth could be seen to correlate well with the findings of Salas et al. [10] , who noted that experts are characterised by a shared mental model in their domain.

Method validations
To validate  , we assessed the agreement ∶ (0 ≤ ≤ 1) between vectors of real-world opinions derived from questionnaire responses from a set of human experts,  ℎ , and  from the FES , as defined by (2) . Each questionnaire has two response components: 1. a set of background factor ratings and 2. the expert opinion rating. The background factor ratings serve to determine  in validation whereas  ℎ is the vector of expert opinion ratings. Precisely, from a single expert's vector of background factor ratings, a de-fuzzified outcome (consequent) is produced from the FES , thereby amounting to a vector  from background factor ratings from all experts.
 is validated in two distinct real-world domains: an Industrial Development Project (IDP) and Passenger Car Performance (PCP) . All computations were performed using MATLAB 2021b with fuzzy logic and optimization toolboxes on an HP Core i5 8th Gen notebook.
IDP Significance: In-house development projects play a major role in the growth of all industries. Outcomes in any project's domain are influenced by various background factors. As development involves significant time and capital, it is natural for management to seek prior estimates of outcome measures against the background factors as accurately as possible. Although domain experts might be able to provide such estimates, their numbers are more limited than not. One such domain is that of aerospace control systems, where expertise is highly limited, as discussed next.
 ℎ ∶ To obtain  ℎ , the questionnaire in Table A Of 105 experts who were sent the questionnaire in Table A.1 , opinion responses were received from 61 experts, which is thus the length of  ℎ . These human opinions would need to be of "expert character " in order for validation to be meaningful, and such character is "normal-like " as reasoned in 'Method details'. Accordingly, we first set out to examine the frequency distribution of the responses to the questionnaire, as shown in Fig. 5 in which the numbers of responses, ℎ , are plotted for IN, KC, CA, SA, and QD . In Fig. 5 , a "dominant-high-frequency " can be noted for each of the five outcome measures. For instance, in the case of KC and QD , a rating of 4 is shared across as many as 42 and 39 responses respectively (69% and 64%). Because such "dominant-high-frequency " is "normal-characteristic ", we could opine that the received responses collectively reflect expertise. Indeed, at a 5% significance level, the 2 goodness-of-fit test confirms the "normal " character of all the five distributions.
 ∶ Table A.3 gives the rules matrix used to produce each element of  . Four memberships are adopted for each fuzzy variable (antecedent/ consequent): VH (very high), H (high), L (low) and VL (very low). All membership functions are Gaussian, with a common support ( min , max ) ≡ (1 , 5) as seen in Fig. A.1 . Functions VL, L, H and VH have peaks at = 1 , 2 , 4 and 5 respectively. Importantly, this study does not consider a membership function corresponding to what would be termed an "average ", with a peak at = 3 . This is because in the authors' belief, the opinion of an "average " character would be of poorer value than any of VL, L, H and VH.
 ∶  is derived by sampling from the normal distributions shown in Fig. A.2 . These distributions are associated with the four opinions VH, H, L and VL in Table A.3 . The centers of truth are located at = 1 , 2 , 4 and 5 for VL, L, H and VH respectively. All four distributions are normalized to yield a unit area over 1 ≤ ≤ 5 .
Optimum rule weights: GA was run adopting a population size of 500, crossover fraction of 0.8, and convergence tolerance of ‖ ‖ ≈ 10 −6 . To assess the influence of , the runs were made for a total of 14 increasing values of =  5 , 10 , 20 , 30 , 40 , 50 , 60 , 70 , 80 , 90 , 100 , 125 , 150 and 250. For each , the optimized  and correlation coefficient (Refer (2) ) were obtained. Figure 6 shows the clear convergence of with generations in the example case of = 250 . Finally, from the optimized  for each , the metric  is computed, defined by where  is of length and represents the th elements of the optimized  across all the .  would show the relative significance of the th of the rules. Table 1 gives the  for each of the four rules, = 1 , 2 , 3 , and 4, in the IDP for each of the five outcomes. From Table 1 , in the case of KC , for instance, we see that rules 3 and 4 (  = 1 . 0 and 0.8736, respectively) are far more significant than rules 1 and 2 (  = 0 . 0002 and 0.0001, respectively).   Correlations , : Figure 7 shows the correlations, , across for all the five outcome measures of the IDP. We see that is relatively high, going from = 91 . 4% to = 98 . 0% on an average across the . This goes to show that  is effective in leading to high agreement between synthetic and human expert opinions. PCP Significance: This domain is of significance to both prospective passenger car buyers and car sales executives. For instance, in a typical car dealer's showroom, a prospective customer would seek the vehicle model most suited to his/her usage needs. Usage needs would vary widely across age groups and travel cultures. For instance, a young customer might opt for a high hauling power if they often travel across mountainous places. Similarly, for heavy usage in commuting within a city, a high fuel economy might be desired by a middle-aged customer with a moderate family size. It is reasonable to assume that both the buyer and sales executive can only have a limited sense of the most appropriate vehicle model, given such complexity of the background and needs. In other words, expertise is typically limited to the "domain " of car choice against need. Accordingly, we considered this domain to test  .
 ℎ : To obtain  ℎ , the questionnaire in Table A.2 was sent to a set of expert users of passenger cars, seeking opinions on a scale of 1 (fair) to 4 (excellent) for a set of two performance parameters, namely, Hauling Power (HP) and Fuel Economy (FE) . These two parameters pertain to a set of nine associated background factors  , out of which only two (  ), namely, age and family size, are directly obtained from the expert user inputs unlike all the background factors were directly obtained in case 1 ( IDP ). The seven background factors  were computed by adopting the process illustrated in Fig. 8 .  and  are defined by (4) .
where  is defined as where  is the truth vector for places visited, with any element equal to 1 for a place visited and 0 otherwise and  is the category-weights matrix given in Table A.5 . Specifically, for each category, the  value is a fraction in the (0,1) interval, assigned  from internet-acquired knowledge, summing up to 1.0 for every place. is computed from (6) as where is the vector of distances ( ) for = 1 , 2 , … , given in Table A.6 . For the th place, ( ) is obtained using Google maps considered in the questionnaire. Finally,  is defined as Of 240 experts who were sent the questionnaire in Table A.2 , opinion responses were received from 103 experts, which is thus the length of  ℎ . The frequency distribution of the responses to the questionnaire is shown in Fig. 9 , in which the numbers of responses, ℎ , are plotted for HP and FE . As in Figs. 5, 9 shows a "dominant-high-frequency " for both performance parameters. In case of HP , for instance, the rating of 3 is shared by as many as 48 (47%) experts, whereas 43 (42%) user experts share the rating of 2 for FE . For both distributions, the 2 goodness-of-fit test confirms the "normal " character at a 5% significance level.
 ∶ Table A.4 gives the rules matrix used to produce each element of  . Four memberships are adopted for each fuzzy variable: The consequent has E (excellent), VG (very good), G (good), and F (fair) as memberships, whereas the antecedents have different membership nomenclatures relevant to nine different background factors; however, all membership functions are Gaussian. Consequent memberships E, VG, G and F have peaks at = 1 , 2 , 3 , and 4 in the support ( min , max ) ≡ (1 , 4) , whereas antecedents have  Fig. 11. vs. . the support max. normalized to unity, ( min , max ) ≡ (0 , 1) , with peaks of respective memberships located equidistantly within the support.  ∶  is derived by sampling from normal distributions as in the IDP domain. These distributions are associated with the four opinions E, VG, G and F in Table A.4 with the centers of truth located at = 1 , 2 , 3 , and 4, respectively.
Optimum rule weights: The GA runs adopted the same population size, crossover fraction, and convergence tolerance as for the IDP . The runs were made for a total of 11 increasing values of = 10 , 20 , 30 , 50 , 60 , 70 , 80 , 90 , 100 , 125 and 150. On convergence, from the optimized  for each , the metric  is computed as defined by (3) . Table 2 gives the  for each of the rules discussed for the PCP . From Table 2 , in case of HP , for instance, we see that rules 3 and 4 (  = 1 . 0 and 0.8222, respectively) are far more significant than the other rules.
Correlations, : Figure 10 shows the correlations, , across for both the performance parameters of PCP . We see that is relatively high, going from = 85 . 6% to = 90 . 8% on an average, across the . This again demonstrates that  is effective resulting in a high agreement between synthetic and expert user opinions.

Convergence of
: For every case, the CPU time for GA optimization was recorded. From this time, we determined the average CPU time per expert opinion, , defined by (8) For both HP and FE , Fig. 11 shows the graphs of vs. , including the best fit curves of three parameters using MATLAB curvefitting toolbox. A clear convergence of for both HP and FE is seen. Such convergence is noteworthy as it indicates robustness of solution and is reasoned as follows: precisely, with increasing , one could expect relatively more samples clustered around the centers of truth than away from them. On the other hand, the "signal-to-noise " ratio, in which (with respect to a center of truth) signal is "being near to ", while "being far from " is noise, would converge with increasing samples, . Indeed, measures this ratio. Finally, from Fig. 11 , we see that = 70 would be a minimum to ensure robustness of the FES .

Summary
In many domains, decision-making is challenging, as experts are often limited in availability. However, without a sufficient number of expert opinions, the associated solutions would not be robust. Motivated by this,  was developed to produce a robust FES by specifying , the number of (synthetic) experts per r ule. In the framework of  , corresponding to every fuzzy rule, antecedent vectors were sampled from uniform distributions as many as times to produce as many human expert consequent opinions using the standard Mamdani inference process. Independently, corresponding to each fuzzy rule, as many synthetic consequent opinions were produced by sampling from normal-distributed memberships. The corresponding opinions-differences-vectors norm was minimized by optimization of the associated rule weights using a GA .  was tested against sets of human expert opinions in two distinct domains, IDP and PCP . The results show that the synthetic and human expert opinions correlated between 91.4% and 98.0% on an average over 5 ≤ ≤ 250 across five outcomes of the IDP . Likewise, for the PCP , respective correlations varied between 85.6% and 90.8% for 10 ≤ ≤ 150 across two performance measures. These strong correlations lead us to conclude that  is capable of producing synthetic opinions to yield a robust FES where sufficient human experts are not available.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.