1 Introduction

A milestone date for Computing With Words (CWW) is 1996 when Zadeh published the seminal paper (Zadeh 1996). Since that time, many papers have been published on this subject [e.g., see Zadeh (1999, 2012), Mendel and Rajati (2015) which contain more than 150 references]. At a high level, there are three aspects to CWW (e.g., Mendel 2001; Zadeh 1999; Mendel and Wu 2010): (1) model words as fuzzy sets (FSs); (2) operate on those FSs to solve a CWW problem (using the mathematics of FSs) the result being a CWW problem–solution FS; and, (3) convert the CWW problem–solution FS back into a word. This paper is only about the first aspect of CWW, namely obtaining a FS model for a word, which is sometimes also referred to as “calibrating the fuzzy sets” (Ragin 2008, Ch. 5).

It is well known that fuzzy sets are meant for linguistic termsFootnote 1 of a linguistic variableFootnote 2 that are naturally ordered. Examples of variables that are (are not) naturally ordered are temperature, pressure, height, profit, etc. (beauty, ill, feelings, etc.). It should be clear from the given examples that variables that are “naturally ordered” means variables whose domains are naturally ordered sets, i.e. sets that are equipped with a linear order relation.Footnote 3 An example of such a set is the set of real numbers. This paper focuses only on such naturally ordered sets because they are the ones that are used in CWW.

One of the things that distinguishes obtaining a FS model for a word in CWWs is that the words must come before the FS model (Mendel and Wu 2010, Ch. 11), whereas in function approximation applications of FSs (e.g., fuzzy logic control, signal processing, classification, clustering, etc.), it does not really matter what the FSs are called, because they are used as a means to an end, the end usually being a number or a class.

There are two approaches to obtaining a FS model for a word in CWW: (1) specify the FS without collecting data about the word, and (2) estimate the FS after collecting data about the word. My own background from working in the fields of estimation theory and statistical signal processing has led me and my students to focus on the second approach, and has been driven by the following principles:

  1. 1.

    The complete vocabulary of all of the words that will be used in the CWW application must be established before FS models are found for the words.

  2. 2.

    The FS model should be based on data that are collected from a group of subjects, so as to account for the intra- and inter-uncertainties about the word.

  3. 3.

    Words contain linguistic uncertainties, i.e. words mean different things to different people, and so the FS model that is used to represent a word must be able to incorporate both of these uncertainties.

  4. 4.

    The data collection must not introduce methodological uncertainties, because such uncertainties cannot be separated from the linguistic uncertainties.

  5. 5.

    The nature of the FS model [i.e., should it be a left (right) shoulder or an interior MF?] should be determined by the data and not be chosen a priori.

  6. 6.

    Because words must also mean similar things to different people, or else effective communication is not possible, the collected data must be pre-processed to enforce this.

Regarding Item 1: the size of the vocabulary (i.e., the number of linguistic terms in T) for a linguistic variable v will affect the calibration of the fuzzy sets. If, for example, only three linguistic terms are used to describe Profitable, namely {Hardly Profitable, Moderately Profitable, Fully Profitable}, then their fuzzy sets will look very different from their fuzzy sets when the following six terms are used: {Barely Profitable, Hardly Profitable, Somewhat Profitable, Moderately Profitable, Fully Profitable, Extremely Profitable}. This is because the term Barely Profitable now appears before Hardly Profitable, and the term Extremely Profitable now appears after Fully Profitable. So, knowing the complete vocabulary for all of the linguistic variables is crucial to the proper modeling of the words in CWWs.

Regarding Item 2: intra-uncertainty of a word is the uncertainty each individual has about the word, whereas inter-uncertainty of a word is the uncertainty that a group of subjects has about the word (Mendel and Wu 2010, Ch. 3). Since a CWW application is often aimed at a group of end-users, it is important to capture both kinds of linguistic uncertainties so that those uncertainties can flow through the CWW computations that lead to a CWW judgment or solution. It is data that are collected from a group of subjects that let us capture both kinds of uncertainties.

Regarding Item 3: we use interval type-2 fuzzy sets (IT2 FSs) as the word model,Footnote 4 because the membership function (MF) of a type-1 FS does not provide flexibility for simultaneously incorporating both kinds of linguistic uncertainties (Mendel 1999). In Mendel (2003, 2007) we explain that it is therefore scientifically incorrect to model a word using a T1 FS,Footnote 5 and in Mendel and John (2002) we explain that an IT2 FS is a first-order uncertainty model for a word, because all of its secondary grades are the same. These IT2 FS word models can be thought of as first-order word granules (Bargiela and Pedrycz 2003; Pedrycz 2013).

Regarding Item 4: we never ask a subject to provide a MF or a footprint of uncertainty (FOU), because only people who are already knowledgeable about FSs can do this. We try to collect data from potential end-users of the CWW application/product, and they, for the most part, have no idea what a FS (FOU) is. Even if we were to explain what a FS (FOU) is to such subjects (e.g., in a short tutorial), most would still not understand it; hence, asking for their FS or FOU will introduce methodological uncertainties, and such uncertainties cannot be separated from the linguistic uncertainties.Footnote 6

Regarding Item 5: we don’t really know ahead of time whether a word should be modeled as a shoulder MF or as an interior MF, and so we do not decide this ahead of time. Instead, we let the data inform us about which of these FS models to use. This is analogous in probability to not choosing the nature of the pdf ahead of time.

Regarding Item 6: there is an adage in signal processing: Garbage in and garbage out. We believe in this adage, and, to avoid garbage at the output of a CWW application/product, we preprocess the data that are collected from a group of subjects in such ways that the word models are highly probable for subjects outside of the group that provides us with the word data. We admit that we may be discarding some outlier data, but our CWW products are not meant for the exceptions. A different approach would have to be taken if that is an objective, so that one does not discard any data (e.g. Wagner et al. 2013).

Early works on trying to map data into FOUs did not adhere to all of the above principles (Mendel and Wu 2006, 2007a, b). At the time of those works, these principles were not even formulated, and since (Mendel and Wu 2006, 2007a, b) required that the FOU be chosen ahead of time, they will not be described herein.

In the rest of this paper, three approaches that adhere to all of these principles are described and compared.Footnote 7 The names of these approaches are: Interval Approach (IA), Enhanced Interval Approach (EIA), and Hao–Mendel ApproachFootnote 8 (HMA for short). Each approach has two parts, a data part (DP) and a fuzzy set part (FSP). Preprocessing is done in the Data Part, ultimately leading to a smaller subset of data that are then mapped in the FSP into the FOU of an IT2 FS. Each of these approaches extracts more and more information from the collected data, leading to FOUs with less and less uncertainty.

Having three approaches for using data collected from subjects to model words by means of IT2 FSs, all coming from Mendel and his students may be very confusing to CWW researchers. Additionally, there is a tendency for researchers who are new to a field to use the first results (in this case the IA) because they may not realize that better results are available. The importance of this paper is that, for the first time, it compares these three approaches side-by-side, and explains why the HMA should be the preferred approach for obtaining IT2 FS word models. This will be of value to all CWW researchers who need to choose an approach for modeling words.

To begin, we explain our data collection method(s).

2 Data collection

We require that each linguistic variable have a vocabulary of linguistic terms assigned to it by one or more experts (practitioners, designers), where the linguistic terms fit a naturally ordered scale, and there can be as many linguistic terms for each linguistic variable, as desired. The size of the vocabulary must be made known to the subjects (or single subject) during the collection of the data from them during the calibration process.

When data are collected from a group of n subjects, they are asked a question like (Hao and Mendel 2014, 2015; Liu and Mendel 2008; Mendel and Wu 2010; Wu et al. 2012): suppose that a word Footnote 9 can be located on a scale of l to r, and you want to locate the end-points of the interval that you associate with the word on that scale. On the scale of l to r, what are the endpoints that you associate with the left [right] end-point of the word? We have administered this kind of survey many times and found that most people have no trouble in answering this question. For each word, the ith subject provides interval endpoints \( a^{(i)} \) and \( b^{(i)} \), and the group of n subjects provides \( [a^{(i)} ,b^{(i)} ]_{i = 1}^{n} \).

When data are collected from a single subject, they are asked two similar questions like (Mendel and Wu 2014): suppose that a word can be located on a scale of l to r, and you want to locate the end-points of the interval that you associate with the word on that scale, but you are unsure of these two end-points: (Q1)[(Q2)] On the scale of l to r, what are the endpoints of an interval of numbers that you associate with the left [right] end-point of the word Footnote 10 ? A single subject provides \( [a_{L} ,b_{L} ] \) and \( [a_{R} ,b_{R} ] \) for each linguistic term. Starting with \( [a_{L} ,b_{L} ] \) and \( [a_{R} ,b_{R} ] \), we (Mendel and Wu 2014):

(1) Assume that \( [a_{L} ,b_{L} ] \) and \( [a_{R} ,b_{R} ] \) are each uniformly distributed, and then compute the mean and variance for both of them.

(2) Assign the mean and variance of \( [a_{L} ,b_{L} ] \) and \( [a_{R} ,b_{R} ] \) to uniform probability distributions and generate 100 random numbers \( (L_{1} ,L_{2} , \ldots ,L_{50} ;R_{1} ,R_{2} , \ldots ,R_{50} ) \). Form 50 end-point interval pairs from these random numbersFootnote 11 \( \{ (L_{1} ,R_{1} ), \ldots ,(L_{50} ,R_{50} )\} \), thereby creating interval end-point data from 50 “virtual” subjects.

(3) Apply the IA, EIA or HMA method to the 50 intervals to obtain the (Person) FOU for the word. This FOU only accounts for a person’s intra-uncertainty about the word.

Using the additional data that are provided by a single subject, this three-step procedure reduces to the IA, EIA or HMA for collecting data from a group of subjects, consequently in the rest of this paper we focus only on comparisons of IA, EIA and HMA for when data are collected from a group of n subjects. Additionally, we use \( l = 0 \) and \( r = 10 \).

3 Data part

The DP of the IA, EIA and HMA has five stages. Each stage is summarized in its own table, and the readers are advised to examine each table before reading the rest of a sub-section.

3.1 Bad data processing (Table 1)

As is stated in Liu and Mendel (2008), Mendel and Wu (2010) this preprocessing removes nonsensical results, because some subjects do not take a survey seriously and so provide useless results (e.g., interval end points that fall outside of [0, 10]). Observe, in Table 1, that the IA requirement \( b^{(i)} \ge a^{(i)} \) is replaced in the EIA and HMA by \( b^{(i)} - a^{(i)} < 10 \), because the former does not constrain a subject to an interval of maximum length \( r - l = 10 \), whereas the latter does. Observe, also, that \( n^{\prime} \le n \) data intervals survive Stage DP1.

Table 1 Stage DP1: bad data processing

3.2 Outlier processing (Table 2)

This preprocessing stage uses a Box and Whisker test (Walpole et al. 2007) to eliminate outliers. Observe, in Table 2, that the IA uses all \( n^{\prime} \) intervals in all three of its tests, whereas the EIA and HMA first use the \( n^{\prime} \) intervals on interval endpoint outlier tests, leading to \( n^{\prime\prime} \le n^{\prime} \) intervals, after which a third outlier test is performed on the lengths of those intervals, leading to \( m^{\prime} \le n^{\prime\prime} \) intervals. Since the length of a data interval is computed using its two end-points, separating the three tests as is done in the EIA and HMA is better than using the same intervals in all three outlier tests. Observe, also, that \( m^{\prime} \le n^{\prime} \) data intervals survive Stage DP2.

Table 2 Stage DP2: outlier processinga

3.3 Tolerance limit processing (Table 3)

A tolerance interval is a statistical interval within which, with some confidence level \( 100(1 - \gamma )\,\% \), a specified proportion \( (1 - \alpha ) \) of a sampled population falls (Walpole et al. 2007). It lets one be confident that the data intervals that would have been provided even by subjects who were not asked for their intervals will be satisfactory. Parameter k in Table 3 has to be chosen, and values for it can be found in Walpole et al. (2007, Table A.7) for \( 1 - \alpha = 0.90,0.95,0.99 \), \( \gamma = 0.05,0.01 \) and selected values of \( m^{\prime} \) (\( m^{ + } \)) from 2 to 300, e.g., if 1 − α = 0.95, \( \gamma = 0.05 \) and \( n = 30 \), then \( k = 2.549 \).

Table 3 Stage DP3: tolerance limit processing

Observe that the IA uses all \( m^{\prime} \) intervals in all three of its tests, whereas the EIA and HMA first use the \( m^{\prime} \) intervals on interval endpoint tolerance limit tests, leading to \( m^{ + } \le m^{\prime} \) intervals after which a third outlier test is performed on the lengths of those intervals, leading to \( m^{\prime\prime} \le m^{ + } \) data intervals. Since the length of a data interval is computed using its two end-points, separating the three tests as is done in the EIA and HMA is better than using the same intervals in all three tolerance limit tests. Observe, also, that \( m^{\prime\prime} \le m^{\prime} \) data intervals survive Stage DP3.

3.4 Reasonable interval processing (Table 4)

According to Liu and Mendel (2008), Mendel and Wu (2010, Ch. 3), a data interval is said to be reasonable if it overlaps with another data interval. Reasonable interval processing removes data intervals that have no overlap or too little overlap with other data intervals [based on a probability analysis that enforces the maxim (Liu and Mendel 2008; Mendel and Wu 2010, Ch. 3) words must mean similar things to different people (for effective communication to occur)]. In Table 4, \( \hat{\xi }^{*} \) is one of the following two values:

$$ \hat{\xi }^{*} = \frac{{(\hat{m}_{b} \hat{\sigma }_{a}^{2} - \hat{m}_{a} \hat{\sigma }_{b}^{2} ) \pm \hat{\sigma }_{a} \hat{\sigma }_{b} [(\hat{m}_{a} - \hat{m}_{b} )^{2} + 2(\hat{\sigma }_{a}^{2} - \hat{\sigma }_{b}^{2} )\ln (\hat{\sigma }_{a} /\hat{\sigma }_{b} )]^{1/2} }}{{\hat{\sigma }_{a}^{2} - \hat{\sigma }_{b}^{2} }} $$
(1)

such that \( \hat{m}_{a} \le \hat{\xi }^{*} \le \hat{m}_{b} \). Note that \( \hat{\xi }^{*} \) in (1) is derived by solving the probability problem:

$$ \xi^{*} = \arg \mathop {\hbox{min} }\limits_{\xi } [P(a^{(i)} > \xi ) + P(b^{(i)} < \xi )] $$
(2)

in which the interval end-points are assumed to be independent and normally distributed (Fig. 1) (Liu and Mendel 2008; Mendel and Wu 2010, Ch. 3).Footnote 12

Table 4 Stage DP4: reasonable interval processing
Fig. 1
figure 1

Reasonable interval tests: for the IA, reasonable intervals must have \( a^{(i)} < \xi^{*} < b^{(i)} \), and for the EIA and HMA, reasonable intervals must have \( a^{*} < a^{(i)} < \xi^{*} < b^{(i)} < b^{*} \) (Wu et al. 2012: © 2012, IEEE)

After the IA was published it was noticed that in many situations the remaining overlapping intervals extended too far to the left and/or to the right, also indicative of words not meaning similar things to different people. Observe, in Fig. 1, that: \( \xi^{*} \) occurs at the intersection of \( p(a^{(i)} ) \) and \( p(b^{(i)} ) \), this occurs at level t, and this level also intersects \( p(a^{(i)} ) \) at \( a^{*} \) and \( p(b^{(i)} ) \) at \( b^{*} \). By making use of this additional information, that was always available from the solution to (2), but was not used in the IA and is used in the EIA and HMA, it is possible to control the extensions of the remaining overlapping intervals. The Table 4 EIA and HMA test not only leads to overlapping intervals but also to intervals that do not extend too far to the left and/or to the right. Observe, also, that at the end of Stage DP4 \( m \le m^{\prime\prime} \) data intervals have survived.

3.5 Statistical information processing (Table 5)

Statistical information processingFootnote 13 extracts statistical information that is used in the FSP of each method, from the remaining m data intervals. In the IA and EIA, each of the remaining data intervals is assumed uniformly distributed so that its mean and standard deviation can be computed. These two statistics capture theFootnote 14 intra-uncertainty of a word. In the HMA, one-sided tolerance intervals are computed for the data interval end-points, and, because group statistics are used to do this, these tolerance intervals capture the inter-uncertainty of a word.

Table 5 Stage DP5: statistical information processing

4 Fuzzy set part

The FSP of the IA, EIA and HMA can be compressed into two major stages, (1) establish the nature of the FOU and (2) establish the FOU. In this section, each stage is summarized in its own table, and the readers are again advised to examine those tables before reading the rest of a sub-section.

4.1 Establish nature of the FOU (Table 6)

In the IA and EIA the nature of the FOU is established by means of a classification procedure whose rules are found by assuming that the m data intervals can either be mapped into an interior FOU, or if this cannot be done, into a left-shoulder FOU, or if this cannot be done, into a right-shoulder FOU. The derivation of the classification procedure is complicated, uses the Mendel–John Representation TheoremFootnote 15 (Mendel and John 2002) in reverse, and leads to the Fig. 2 classification diagram. For each word, its sample mean pair \( (\hat{m}_{l} ,\hat{m}_{r} ) \), computed for the m surviving intervals, locates a point on this diagram within one of the three shaded regions; this leads to a classification of the word’s IT2 FS model as either an interior, left-shoulder or right-shoulder FOU. Once this has been established it is then known which T1 FS each of the m surviving data intervals will be mapped into.

Fig. 2
figure 2

Classification-diagram with three FOU decision regions. The Unreasonable Region is so named because it would be unreasonable for a word shoulder FOU to extend so far either to the left or to the right into that region (Liu and Mendel 2008: © 2008, IEEE)

Table 6 Stage FS1: establishing the nature of the FOU for the word W

The classification procedure in the HMA is very easy to obtain and is based on the one-sided tolerance intervals, \( \underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{a} \) and \( \bar{b} \), that were computed in Stage DP5.

At the end of this stage, the nature of the FOU for a word is known.

4.2 Establish the FOU (Tables 7, 8)

In the IA and EIA, formulas that are used to map each of the surviving intervals into its respective “embedded” T1 FS are given in Table 7. After bounding these T1 FSs from below and above, the resulting FOUs look like the ones in Fig. 3. Note that these FOUs are not chosen ahead of time but instead result from the two bounding procedures, which can be thought of as a direct application of the Mendel–John Representation Theorem (Mendel and John 2002). The difference in determining the LMFs in the IA and EIA is explained in Table 9.Footnote 16

Table 7 Formulas for the two parameters of each T1 FS (Wu et al. 2012)
Fig. 3
figure 3

IA and EIA FOUs for a left-shoulder, b interior and c right-shoulder (Mendel and Wu 2010)

Table 8 Stage FS2: establishing the FOU

In the HMA (see Table 8) first the overlap of the m surviving intervals is computed, after which it is removed from each of those intervals leading to one set of m shorter-length intervals for a shoulder FOU, but to two shorter-length intervals for an interior FOU.

Next, we come to the following major difference between the HMA and IA/EIA: In the HMA the shape of the FOU is assumed, ahead of time, whereas in the IA or EIA it is established by the bounding procedures just described. Both methods map two statistics about the m data intervals into MF parameters, but in very different ways.

The IA and EIA assume that each of the m surviving data intervals is uniformly distributed, which seems quite sensible since a subject is only asked for interval end-points and so there is no information collected that would let one assume anything other than a uniform distribution between those end-points. This assumption is needed so that the mean and standard deviation can be computed for each data interval. Those statistics are then mapped into the parameters of a T1 FS, after which all of the T1 FSs are bounded from below and above to provide the LMF and UMF of the FOU.

In the HMA, once the overlap is removed, one begins with the sets of smaller-length data intervals, and since one end of all of the intervals has been determined as a part of the overlap determination procedure, it is no longer valid to assume that the shorter intervals are uniformly distributed. In the HMA, one therefore computes the mean and standard deviation for the group of m shorter-length intervals, and since they are for the entire group of intervals, they have to be mapped into comparable uncertainty measures for the entire FOU. This requires the shape of the FOU to be pre-specified, and, because only two statistics are mapped into the entire FOU, the FOU must be completely specified by only two parameters, which is why the piecewise-linear FOUs shown in Fig. 4 were chosen.

Fig. 4
figure 4

HMA FOUs for a left-shoulder, b interior and c right-shoulder

In the HMA, the average of the centroidFootnote 17 of each triangular FOU is equated to the sample mean of the set of m smaller-length data intervals, and the upper bound of the standard deviationFootnote 18 interval for each triangular FOU is equated to the sample standard deviation of the set of m smaller-length data intervals. Hao and Mendel (2015) obtain closed-form formulas for both the average of the centroid and the upper bound of the standard deviation of each triangular FOU, from which the HMA FOU parameter formulas given in Table 9 were obtained. Because an interior FOU begins with two sets of shorter data intervals, each of its triangular sub-FOUs (portions) has its two parameters computed independently. Comparing Figs. 4b with a, c, observe that the parameters for the right portion of the interior FOU are the same as the parameters of a left shoulder FOU, and the parameters for the left portion of the interior FOU are the same as the parameters of a right shoulder FOU.

Table 9 Formulas for the FOU parametersa

5 Discussion

Because there are lots of details for the HMA, IA and EIA, it is easy to lose sight of how they account for linguistic uncertainties, namely the intra- and inter-uncertainties that were explained in Item 2 of Sect. 1. The HMA and IA/EIA account for these uncertainties in very different ways, and this is summarized in Table 10.

Table 10 Ways that a word’s intra-and inter-uncertainties are treated by the HMA and IA/EIA (Hao and Mendel 2015)

Software for IA, EIA and HM can be accessed atFootnote 19 http://sipi.usc.edu/~mendel.

6 Example

Hao and Mendel (2015) have six examples that illustrate many aspects of the HMA. Its Examples 2 and 3 compare the HMA, EIA and IA FOUs for a 32-word vocabulary that were estimated from the same data intervals, collected from 175 subjects, used in Wu et al. (2012). Here we show the FOUs in Fig. 5 only for three of the words, Low Amount, Moderate Amount, and High Amount, because many applications use these words. It is visually clear that the EIA FOUs are noticeably fatter than the HMA FOUs, and the IA FOUs are noticeably fatter than the EIA FOUs, clear indications that using more data from the data intervals reduces the MF uncertainties.

Fig. 5
figure 5

FOUs for three linguistic terms obtained from the IA, EIA and HM

7 Conclusions

This article has, for the first time, compared three methods for estimating (synthesizing) an IT2 FS model for a word beginning with data that are collected from a group of subjects, or from a single subject. It summarizes the stages for each of the methods in tables so it is possible to compare the steps of each stage side-by-side. It also demonstrates, by means of an example of three words, that using more information contained in the collected data intervals is equivalent to reducing the uncertainty in the IT2 FS model.

To this author, the preferred approach for mapping data intervals into an FOU is the HMA because it uses more information contained in the collected data intervals than does the IA or the EIA, and because it is the only approach to-date that leads to normal FOUs. Such FOUs are easier to compute with than are non-normal FOUs.

Perhaps there is even more information contained in the collected data intervals. As is stated in Hao and Mendel (2015): “It is our belief that one should extract as much information from the collected data intervals as possible, and it is an observation made in this paper that as more and more information is extracted from the data the less uncertain are the word IT2 FS models. Formulating this connection between information, IT2 FSs and the uncertainties of their MFs in a precise mathematical way is an open research problem.”