Boosted Fuzzy Granular Regression Trees

-e regression problem is a valued problem in the domain of machine learning, and it has been widely employed in many fields such as meteorology, transportation, and material. Granular computing (GrC) is a good approach of exploring human intelligent information processing, which has the superiority of knowledge discovery. Ensemble learning is easy to execute parallelly. Based on granular computing and ensemble learning, we convert the regression problem into granular space equivalently to solve and proposed boosted fuzzy granular regression trees (BFGRT) to predict a test instance. -e thought of BFGRT is as follows. First, a clustering algorithm with automatic optimization of clustering centers is presented. Next, in terms of the clustering algorithm, we employ MapReduce to parallelly implement fuzzy granulation of the data. -en, we design new operators and metrics of fuzzy granules to build fuzzy granular rule base. Finally, a fuzzy granular regression tree (FGRT) in the fuzzy granular space is presented. In the light of these, BFGRT can be designed by parallelly combing multiple FGRTs via random sampling attributes and MapReduce. -eory and experiments show that BFGRT is accurate, efficient, and robust.


Introduction
Learning ability is the basic feature of human intelligence. Prediction is the ability of humans to judge the future based on learning, and it is also a concrete manifestation of human learning ability. Prediction has two concrete forms, regression and classification, which is also one of the substance problems in machine learning, data mining, and statistics. How to train a learner on the basis of existing data is the primary research purpose of the regression problem. It can help people discover the laws of development and change of things from massive historical data, so as to make scientific and quantitative predictions about the future. In the classification problem, the target output is to take values in a finite discrete space, and these values can be either ordered or disordered. In regression problems, the range of output variables is ordered and continuous.
Research on regression problems based on ensemble learning is a hot topic in machine learning research in recent years and has received extensive attention. Nevertheless, the application of ensemble learning to regression problems is still unsatisfactory and needs further research. e regression problem to be solved in reality often comes from a very complex social economic system, and various intricate internal and external factors have linear or nonlinear effects on it. Some are inherent factors, and some are accidental factors. A single learner can only learn for a certain type of data, and it is difficult to get satisfied learning results. Especially in the big data environment, traditional regression learning algorithms are not able to meet the learning requirements of massive complex data in terms of predictive performance and scalability. Although a great progress has been made in the related theoretical research and technology of machine learning, how to continuously improve the generalization ability and learning efficiency of learners is still an important issue and continuous pursuit of machine learning research. GrC is an effective method for simulating problemsolving thinking of human and processing analysis tasks of big data. It abstracts and divides complex problems into several simpler ones, which helps to better analyze and solve problems. Combining granular computing with ensemble learning to solve regression problems is a good idea.
As the first of the four main research directions in machine learning, ensemble learning learns several different single learners on the training dataset and then combines the respective prediction results as the final output one. erefore, in most cases, it can perform better than a single learner on generalization and stability [1]. e weak learner can be upgraded to a strong learner is one of the main theoretical foundations of ensemble learning. Kearns and Valiant gave the concepts of weak learning and strong learning from the perspective of classification problems [2]. Avnimelech and Intrator introduced the above concepts into the regression problem and gave a proof of the equivalence of the strong and weak learning of the one [3]. Another major theoretical basis for ensemble learning is the "No free lunch" theory proposed by Wolpert [4]. e implementation method of ensemble learning has received extensive attention from researchers and has achieved some research results. ese results can be summarized into two categories: one is the direct strategy and the other is the overproduce and choose approach. Liu and his colleagues proposed an ensemble learning method via negative correlation learning [5]. Fang et al. proposed a selective boosting ensemble learning algorithm [6]. Breiman obtained multiple different training datasets by repeatedly sampling the original sample dataset [7]. Schapire proposed the boosting method, whose main idea is to continuously strengthen the learning of "difficult samples" in the iterative learning process [8]. Ho presented the random subspace method that uses different subsets of the feature space to train and generate multiple learners [9]. is method is different from bootstrap sampling, boosting, and cross-validation approaches and emphasizes the differences between different feature subsets. Breiman designed the output smearing method for regression problems. e primary thought is to inject Gaussian noise into the output variable [10]. e same method is also used to manipulate input variables [11]. Gheyas and Simith presented a dynamic weighted combination method, which is to dynamically adjust the corresponding weights through the predictive performance of the individual learner [12]. e research of ensemble learning on regression problems started late, and there are relatively few research results in applications, such as power load forecasting [13,14].
Granular computing is a very popular research direction in the field of computational intelligence in the past few decades. e core task of GrC is to construct, represent, and process information granules. Information granule is the foundational element of GrC. It is a set of some elements gathered together according to indistinguishability, similarity, or functionality. As a key component of knowledge representation and processing, information granules always appear with information granulation, and information granulation occurs in the process of abstracting data or inducing knowledge from information. Information (data) forms information granules through information granulation. e representation forms of information granules often include interval [15], fuzzy set [16], and rough set [17]. e purpose of information granulation is to separate complex problems into several simpler problems. is way can make us capture the details of the problem. Information granulation and information granules are almost infiltrated in various human cognition, decision-making, and reasoning processes and are closely related to information granularity. For example, in daily life and work, people usually use different time intervals such as day, month, and year to granulate time to obtain information granules of different sizes, and the size of the formed information granules implies the level of information granularity used in granulation. rough the abstraction of the problem, the "finer" and "more special" information granules are transformed into "more coarse" and "more general" information granules and "more coarse" and "more general" information granules can be refined into "finer" and "more special" information granules. GrC helps people analyze and watch the similar problem from extremely different granularities through the transformation between information granules and finally find the most suitable level of analysis and problem solving. e basic composition of GrC mainly includes three parts: granule, granular layer, and granular structure [18]. Granules are the foundational elements of GrC models [19]. e granular layer is an abstract description of the problem space according to a certain granulation criterion [20]. Granular structure is an abstract description of all granular layers. One granulation criterion corresponds to one granular layer, and different granulation rules correspond to multiple granular layers. It shows that people observe, comprehend, and solve problems from various views. All the interconnections between the granular layers form an interactive structure called the granular structure [21].
ere are two basic problems in GrC: granulation and calculation based on granulation [22]. Granulation is the first step of GrC to solve the problem, and it is a process of constructing a problem-solving knowledge space. e human brain's cognitive process of external things is a typical granulation, that is, from the overall and rough cognition, through the continuous processing and refinement of information, it finally forms a partial and detailed analysis and reasoning. e granulation process mainly involves granulation criteria, granulation methods, granule descriptions, and other issues [23]. Granularizationbased computing refers to solving the original problem or logical reasoning with granules as the object of operation and the granularity level as the carrier. It is mainly divided into two types: mutual reasoning between granules on the same granular layer and conversion between granules on different granular layers. After years of study and growth, many models of GrC and their extended models have been proposed. e most representative ones are three theoretical models: the fuzzy set model, rough set model, and quotient space model.

Fuzzy Set Model.
Zadeh presented the fuzzy set theory in 1965. He pointed out that a single element always belongs to a set to some extent and may also belong to several sets to a different degree [16]. Under the fuzzy set theory system, he presented a GrC model on the basis of word computing. e core idea of this method is to use words for calculation and reasoning instead of numbers, to achieve fuzzy reasoning, and control of complex information systems, which is in line with human thinking. In addition, Wang employed natural language knowledge to establish a linguistic dynamics system based on word computing and designed a computational theoretical framework for a linguistic dynamic system based on word computing by fusing concepts and schemes in multiple fields [24].

Rough Set Model.
e degree to an object belonging to a certain set varies with the granularity of the attribute. In order to better characterize the ambiguity of set boundaries, Pawlak proposed the rough set theory in the 1980s. Its essential idea is to adopt indistinguishable relations (equivalence relations) to establish a division of the universe of equivalence classes to establish an approximate space. In the approximate space, upper approximate set and lower approximate set are employed to approximate a set with fuzzy boundaries [17]. e classic rough set theory is mainly aimed at a thorough information system where all feature values of the object processed are understood. In order to make the rough set theory be suitable for handling of uncomplete information systems, there are currently two main ways: one is to fill incomplete data and the other is to expand the classic rough set model. Kryszkiewicz proposed an extended rough set model via tolerance relations [25]. Stefanowski and his teamwork presented an extended rough set model on the basis of asymmetric similarity relations and an extended rough set model by quantitative tolerance relations [26]. Wang analyzed the shortcomings of the previous two expansion models and designed a rough set model based on the restricted tolerance relationship. He found that the tolerance relationship and the asymmetric similarity relationship are the two extremes of the indistinguishable relationship expansion, that is, the condition of the tolerance relationship is too loose, the condition of the asymmetric similar relationship is too tight, and the limit tolerance relationship is between them [27]. Pawlak employed the idea that elements in the equivalent category have the same membership function and explored the structure and granularity of knowledge granules [28]. Polkowski and Skowron adopted the rough mereology method, neural network technology, and the idea of knowledge granulation to design a rough neural computing model, which combines the division block of the rough set and the neural network to form an efficient neural computing method [29]. Peters and his colleagues employed the indistinguishable relationship to divide the real number into multiple subintervals and divided a whole area into several grid units, and each unit was regarded as a granule, and proposed metric is between two information granules on the adjacent relationship and the containment relationship, respectively [30].

Quotient Space eory
Model. B. Zhang and L. Zhang presented the theory of quotient space when studying problem solving. ey said that "the recognized feature of quasi-intelligence is that human can analyze and watch the same problem from different granularity" [31]. e quotient space theory established a formal system of quotient structure and proposed a series of theories and approaches to solve problems in the fields of heuristic search, information fusion, reasoning, and path planning. ere were some related research and applications [32,33].
In addition, many new models have been proposed, such as granular matrices for reduction and evaluation [34], three-way decision model [35][36][37], and ensemble learning for big data based on MapReduce [38][39][40][41][42][43][44]. In this study, we adopt parallel granulation and ensemble learning based on MapReduce to solve the regression problem from granular computing angle and enhance the performance of regression and efficiency of granulation.

Contributions
In this study, a regression problem is equivalently transformed into the fuzzy granular space solution, and BFGRT are constructed from angle of GrC and ensemble learning to solve the regression problem. e main contributions are as follows.
(i) First, an adaptive clustering algorithm is proposed, which can adaptively find the optimal cluster centers. It is a global optimization algorithm that solves the problem that classic clustering algorithms rely heavily on the initial cluster centers and are easy to fall into local optimal solution. (ii) Second, parallel fuzzy granulation of data based on the above clustering algorithm combined with MapReduce is presented, which solves the problem of high complexity of traditional granulation and enhances granulation efficiency (iii) ird, we define fuzzy granules and related metric operators, design a loss function, construct an individual fuzzy granular regression tree in the granular space by optimizing the loss function, and then parallelly integrate multiple fuzzy granular regression trees built by different attributes into a stronger learner based on MapReduce to accurately solve the regression problem

The Regression Problem
e regression problem is divided into two processes: learning and prediction.
. , a m is the attribute set, V � ∪ h∈H V a is the set of range, V a is range of the attribute a, and h: X × A ⟶ V is an information function (it assigns a value to each instance on each attribute, namely, ∀h ∈ H, x ∈ X, h(x, a) ∈ V a ). Let X � x 1 , x 2 , . . . , x n and Y � y 1 , y 2 , . . . , y n , where y i corresponds to the output of x i (i � 1, 2, . . . , n). e learning system constructs a model Y � f(X) based on the instance set. For the test instance x n+1 , the learning system can predict the corresponding output y n+1 by Y � f(X).

The Primary Algorithm
In order to find the solution of the problem mentioned above, the algorithm will be presented through three stages. First, cluster the data. e purpose is to prepare for the parallel fuzzy granulation of the data. In this process, we design a novel clustering algorithm that adaptively optimizes the cluster center. According to the algorithm, cluster centers of instances are automatically calculated instead of giving the number of clusters in advance. Next, these cluster centers can be used as reference objects independent of the data to granulate data parallelly. Finally, we transform this instance regression problem into a fuzzy granular regression problem in the granular space. In the fuzzy granular space, we design related operators and loss functions, construct multiple weak fuzzy granular regression trees by optimizing the loss function, and then integrate these weak fuzzy granular regression trees into a strong learner to predict the regression value. e process is shown in Figure 1.

Clustering Algorithm with Automatic Optimization of Cluster Centers.
Classic clustering algorithms need to specify the number of cluster centers in advance to obtain clustering results. e methods depend heavily on the above parameter. If the number of cluster centers is not suitable, it will fall into a local minimum solution. An adaptive clustering algorithm that adaptively selects the number of cluster centers is presented, which is a global optimization algorithm. e principle is as follows. As wellknown, if the standard deviation δ of cluster centers is larger and the standard deviation σ k within clusters is smaller, then the performance is better. erefore, a loss function L(σ, δ, K) � log( K k�1 σ 2 k ) − log δ 2 is designed as the evaluation criterion, where σ k denotes the standard deviation of the k th cluster and K represents the number of cluster centers. e aim is to decrease the loss function value by adjusting cluster centers until the maximum iteration is achieved or the loss function value hardly changes. In each iteration, a set of cluster centers corresponding to loss function value can be obtained and be integrated as an evaluated set. In each iteration, the ratio of the farthest distance from the remaining instance points to cluster centers and the sum of the farthest distances from all instance points to each cluster center is the probability selected as the next cluster center. When the termination condition is achieved, find a set of cluster centers corresponding to the minimum cost function from this evaluation set, which is what is required.
Step 1: remove the instances missing some attribute values Step 2: normalize instances Step 3: initialize parameters, such as maximum iteration I, evaluated set EV � ϕ (contains cluster centers and loss function value), and current iteration i � 1 Step 4: initialize current cluster center set C i � ϕ and randomly select one instance point as the cluster center Step 5: calculate the farthest distance between the remaining instance points and all cluster centers and let d(x j ) � max 1≤k≤i dis(c k , x j ) , and the probability of the instance selected as the next cluster center is Step 6: if x j is selected as a cluster center, then (1) Step 7: if k < i + 1, go Step 5. Otherwise, go Step 8.
Step 11: in the evaluated set EV, select the cluster centers according to Namely, C * � c 1 , c 2 , . . . , c |C * | and the optimization number of cluster centers K � |C * | (| · | expresses the number of elements of the set).
Algorithm 1 shows the pseudocode of this principle.

Parallel Fuzzy Granulation.
In granulation, serial granulation is adopted in most methods. To enhance efficiency, we propose parallel fuzzy granulation. First, cluster centers can be obtained by the approach mentioned above. en, instances are divided into a set of instance subsets, which are fuzzy granulated by cluster centers. is process is executed parallelly by MapReduce. According to Algorithm 1, the cluster center set C � c 1 , c 2 , . . . , c K can be obtained.
For ∀x i ∈ X, ∀a j ∈ A, and ∀c k ∈ C, the distance between x i and c k on the attribute a j can be written as follows: A fuzzy granule induced by the instance x i and the cluster center c j can be written as follows: For simplicity, the fuzzy granule uses also the following equation to denote. 4 Mathematical Problems in Engineering Here, symbol "− " is a separator and symbol "+" is an union operation, that is, fuzzy granule q(x i , a j ) denotes the distance set between instance x i and cluster centers. Its cardinal can be written as Four operators of fuzzy granule can be designed. For ∀e, f ∈ R, operator ∪ and operator ∩ can be written as follows: Training set T = {(x 1 , y 1 ), (x 1 , y 1 ), ···, (x n , y n )} Input: instance set X, maximum iteration I, threshold value ϵ, N Output: optimization cluster center set C * (1) Remove instances missing some attribute values.
.\\ e probability of x j that is selected as next cluster center (14) p � GenProb(); \\Random generate a probability. L(δ, σ, i)) . Calculate the loss function value and cluster center in this iteration and update the evaluated set. where ] ∈ [0, ∞] is a parameter. For ∀x, x ′ ∈ X, and ∀a ∈ A, the operators between the fuzzy granule formed by x and the one induced by x ′ are written as follows: For ∀U⊆A and U � a 1 , a 2 , . . . , a |U| (|U| ≤ |A|), fuzzy granular vector induced by x on the attribute set U can be written as where symbol "+" denotes an union operation and symbol "− " represents a separator. Its cardinal can be obtained as follows: Operators of fuzzy granular vector are written as follows: According to the definitions, the distance between the two fuzzy granular vectors is given by From the above fuzzy granulation, it can be seen that the fuzzy granules are obtained by calculating instances and cluster centers using the fuzzy operators, and fuzzy granular space consists of these fuzzy granules.
Theorem 1. For ∀x, x ′ ∈ X, ∀U⊆ A, and ∀a ∈ U, the distance between fuzzy granular vectors satisfies Proof. According to the definition of fuzzy granule, we have We divide both sides of the formula by |U| * |C| to achieve □ Theorem 2. For ∀x ∈ X, the attribute subsets U and V satisfy U⊆V⊆A. Let Q(x, V) and Q(x, U) be fuzzy granular vectors of x on V and U, respectively. en, |Q(x, U)| ≤ |Q(x, V)| has been proven.

Proof.
For ∀a t ∈ U, equation (9) shows that Q(x, U) � |U| t�1 q(x, a t )/a t . anks to U⊆V, we have that for □ Now, we take a case to describe the fuzzy granulation process. Table 1, given an instance set X � x 1 , x 2 , x 3 , x 4 , an attribute set A � a 1 , a 2 , a 3 , regression value set Y � y 1 , y 2 , y 3 , y 4 , cluster center set C � c 1 , c 2 , and parameter ] � 0.5, the fuzzy granulation is as follows. 6 Mathematical Problems in Engineering

Mathematical Problems in Engineering
Similarly, we can get us, we can obtain the distance between fuzzy granular vectors of x 1 and x 2 on A with ] � 0.5 by

Fuzzy Granular Regression Tree.
After the above fuzzy granulation, the data can be transformed into fuzzy granules.
In fuzzy granular space, we give the following definition. y 2 ), . . . , (x n , y n ) is a instance set, x i is a input variable, and y i is output variable corresponding to x i . Let X � x i |i � 1, 2, . . . , n , and Y � y i |i � 1, 2, . . . , n . C � c i |i � 1, 2, . . . , K is a cluster center set and A � a i |i � 1, 2, . . . , m an attribute set. For ∀a j ∈ A, x i ∈ X, we can obtain fuzzy granule q(x i , a j ) via fuzzy granulation (the following is abbreviated as q). Fuzzy granules and operators can create new fuzzy granules, such as g � w ∩ q ∪ h. Repeating this process can expand a fuzzy granular space G on T. According to training data, a fuzzy granular rule base can be generated as A fuzzy granular regression tree corresponds to a division of the fuzzy granular space and output value on the divided unit.
Suppose that the input space has been splitted into d units D 1 , D 2 , . . . , D d , and there is a fixed output value z i on each unit D i . us, the fuzzy granular regression tree can be expressed as Here, Q is a fuzzy granular vector and I is an indicative function, which can be written as where ‖D i ‖ denotes the number of the fuzzy granular vector. e question becomes how to divide the fuzzy granular space.
Here, we use a heuristic method to select feature a j as the segmentation variable and the cardinal of fuzzy granule q(x s , a j ) as the optimal segmentation point. Two areas are defined by en, find the optimal segmentation variable a j and point |q(x s , a j )|. Specifically, solve the loss function For input variable a j , the optimal segmentation point s can be calculated as where ‖ · ‖ denotes the number of elements. Traverse all input variables and find the optimal segmentation variable j to form a pair (j, s). Divide the input space into two areas in turn. en, repeat the above division process for each area until the terminal condition is met. In this way, a fuzzy granule regression tree is generated. e algorithm is described in Algorithm 2.

Boosted Fuzzy Granular Regression Trees.
BFGRT can be an algorithm that integrates multiple fuzzy granular regression trees through the idea of ensemble learning to draw conclusions. It does not rely on only one fuzzy granular regression tree but adopts many fuzzy granular regression trees to solve the task together, using the weighted average of the regression values of multiple fuzzy granular regression trees as the final regression value. Assuming that T is the instance set, fuzzy granular space is G, fuzzy granular rule base is R, the number of instances is n, and the number of attributes is m, we construct parallelly t fuzzy granular regression trees as follows: Step 1: create J map tasks Step 2: instance set extraction: randomly draw n fuzzy granular vectors from R with replacement and repeat them n times. e probability of each fuzzy granular vector being selected is 1/n. e unselected fuzzy granular vectors form the out of bag data as the test set.
Step 3: attribute extraction: extract t attributes from A to compose attribute subset B (B ⊂ A) Step 4: attribute selection: calculate the optimal segmentation attribute a j and the optimal segmentation point |q(x s , a j )| in the data set of node, divide the node into two child nodes, and allocate the remaining fuzzy granular vectors to the child nodes Step 5: generate a fuzzy granular tree. Repeat Step 3 in the fuzzy granular vector set of each child node to recursively split the nodes until all leaf nodes are generated.
Step 6: repeat steps 2-5 to get J different fuzzy granular regression trees, which correspond to J Map tasks Step 7: BFGRT can consist of J fuzzy granular regression trees and the process can be executed by reduce task, that is, Where ω j � 1 − (exp(δ j )/ J i�1 exp(δ i )), and δ j represents the root mean square error (RMSE) of j th fuzzy granular tree. is detail of the algorithm can be illustrated in Algorithm 3.

Regression.
Given a test instance, we fuzzy granulate the test instance to get a fuzzy granular vector. en, use BFGRT to predict the fuzzy granular vector to achieve the regression value. Algorithm 6 demonstrates the algorithm.

Experimental Analysis
In this experiment, the results presented are generated on the server of 2 * Intel Xeon Gold6248R@2.50 GHz with 64 GB memory. Datasets include 3 datasets gathered from the UC Irvine Machine Learning Repository and 3 datasets with 1% noise constructed (Table 2), and 10-fold crossvalidation which belongs to sampling without replacement was adopted to test the performance of BFGRT, as illustrated in Algorithm 3. e basic idea of 10-fold cross-validation is to partition the dataset into nonoverlapping L equal parts. In the training process, one part of dataset is selected to verify the generalization ability of the learner, and the remaining Input: instance set X, regression value set Y Output: fuzzy granular tree f (1) Remove the instances missing some attribute values.
FOR i � 1 to |X t | ∃x i ∈ X t FOR j � 1 to m ∃a ∈ A, sample x is fuzzy granulated as q(x i , a) � K j�1 s(x i , a, c j )/c j END FOR Build a fuzzy granular vector Q(x, A) � |A| j�1 q(x, a j )/a j ; Get label of x i , y i ; A fuzzy granular rule can be built. r i � < Q(x i , A), y i > ; END FOR (6) Select the optimal segmentation variable j (i.e., the attribute a j ) and segmentation point s (i.e., |q(x s , a j )|) by solving equation at is, traverse variable j to find the pair (j, s) that minimizes the loss function by fixing the segmentation variable j and scanning segmentation point s. (7) Divide the area with the selected pair (j, s) and decide output value as follows:

Mathematical Problems in Engineering
L − 1 parts are used as a training learner. After training L times, L learners can be obtained. is method is very similar to the bagging method and also supports parallel learning. Root mean square error (RMSE) and execution time are the metrics of the performance. We compared classic fuzzy granulation with parallel fuzzy granulation proposed in Figure 2 and analyzed the performance of support vector regression (SVR), random forest (RF), long short-term memory (LSTM), and BFGRT proposed in Figures 3-5 . We also gave the relation between the number of cluster centers and RMSE. Fuzzy granulation of data is an important process of modeling. Some traditional fuzzy granulation methods do not require clustering, of which idea is to construct a matrix using the similarity of each sample to other samples. is kind of thinking does not have the conditions for parallel execution and can only be executed serially, and its time complexity is O(n 2 * m), where n is the number of instances Input: instance set X, regression value set Y, the number of fuzzy granular regression tree J Output: boosted fuzzy granular regression trees F (1) Get a fuzzy granular vector rule base R by parallel fuzzy granulation of the dataset (see Algorithm 2, Algorithm 4, and Algorithm 5.) (2) Create J tasks, namely, map 1 , map 2 , . . . , map J (3) Execute the following operations for each independent task (j � 1, 2, . . . , J): MapFunction(key, value), where key � offset of instance and value � (n, t) indicates that n fuzzy granular vectors are randomly //selected from R. //Randomly select t attributes from the attribute set A (constitutes attribute subset B j t , that is, B j t ⊂ A) //Form a fuzzy granular rule set R j , build a fuzzy granular regression tree f j , and get its RMSE FOR i � 1 to instances-total-number (5) SubsetID � i mod J (6) context.write(SubsetID, FuzzyGranularVector) (7) END FOR END MapFunction (8) ReduceFunction(key, value)//Here, key � SubsetID, value � FuzzyGranualrVector (9) Job.addCache(FuzzyGranularVector[SubsetID]) (10) (f SubsetID , δ SubsetID ) � train(SubsetID, FuzzyGranularVector)//(See Step 6-Step 9 of Algorithm 2.) (11) context.write(1,(f SubsetID , δ SubsetID )) END ReduceFunction (12) F(·) � J j�1 ω j · f j (·)//Calculate BFGRT F composed of the linear combination of J fuzzy granular regression trees. and m is the number of attributes. Parallel fuzzy granulation proposed is executed as follows. First, we obtain cluster centers by the clustering algorithm designed in the study. Next, we need to construct fuzzy granules through each instance and each cluster center. is process can be executed parallelly by MapReduce. e time complexity is O(n * k * m/t), where k denotes the number of cluster centers (k < n), and t represents the number of parallel tasks.
MapReduce can be used for parallel fuzzy granulation. e main thought is as follows: partition the sequence file job into multiple independently runnable map tasks, assign them to several processors to execute, produce intermediate results, and then collect the reduce task operations to generate the final output. Map tasks and reduce tasks can be both parallelly executed. e MapReduce process is divided into two parts, i.e., map and reduce. Map function and reduce function of the fuzzy granulation process are shown in Algorithm 4 and Algorithm 5, respectively. e results of parallel granulation are demonstrated in Table 3. As Input: test instance x, cluster center set C, BFGRT F � (f 1 , f 2 , . . . , f J ; ω 1 , ω 2 , . . . , ω J ) . Output: regression value y * of instance x (1) Granulate x to fuzzy granular vector Q(x, A) (2) For j � 1 to J y j � f j (Q(x, A j )), where according to the parameters of f j , attribute set selected satisfies (Q(x, A)) (5) Return y * ALGORITHM 6: e algorithm of prediction. Classic fuzzy granulation Parallel fuzzy granulation  Figure 2, when the number of parallel tasks is 3, parallel fuzzy granulation performs better than classic fuzzy granulation by about 252%, 647%, and 438% regarding the metrics, minimum efficiency, maximum efficiency, and average efficiency, respectively.
As shown in Figure 3(a), tested on dataset combined cycle power plant, BFGRT show a shape that is low in the middle and high on both sides. When the number of cluster centers is between 3000 and 6000, the RMSE of BFGRT is lower than the other three methods. When the number of cluster centers is within 3000, the RMSE of BFGRT drops quickly, and the slope of the curve drops significantly. When the number of cluster centers is higher than 5000, the curve of BFGRTshows a small local oscillation, which is lower than the other three methods. In particular, when the number of cluster centers is 3986, the RMSE of BFGRT achieves the minimum value of 3.1151, which is about 46.88%, 4.22%, and 32.85% better than SVR, RF, and LSTM, respectively. BFGRT is slightly better than RF and far better than SVR and LSTM. When the number of cluster centers is 1006, the RMSE of BFGRT reaches the maximum value of 3.4681, which is about 24.20% and 25.24% better than RF and LSTM, respectively, and is about 6.64% worse than RF. On average, the RMSE of BFGRT is 3.2441, which is better than 4.5753 of SVR, 3.2522 of RF, and 4.6389 of LSTM (i.e., 29.10%, 0.25%, and 30.07% improvement, respectively).
After we added 1% noise to the dataset, as illustrated in Figure 3(b), the curve of BFGRT resembles an inverted "Mexican straw hat." When the number of cluster centers is 3986, the RMSE of BFGRT gets the minimum value of 3.3151, while that of SVR, RF, and LSTM are 4.8946, 3.5322, and 4.9289 (i.e., 32.27%, 6.15%, and 32.74% improvement, respectively). When the number of cluster centers is 9360, BFGRT obtain the maximum value of 3.5651, which improves the performance by about 27.16% and 27.67% compared with SVR and LSTM, respectively, and is about 0.93% worse than RF. On average, the RMSE of BFGRT is 3.4300, which performs about 29.92%, 2.89%, and 30.41% better than SVR, RF, and LSTM, respectively. In terms of the degree of noise influence, the RMSE of SVR, RF, LSTM, and BFGRTon the noisy dataset increases by about 6.98%, 8.61%, 6.25%, and 5.73%, respectively. BFGRT is less affected by noise and more robust than the other three algorithms.
As demonstrated in Figure 4(a), in the dataset bias correction of numerical prediction model temperature forecast, there are 7750 instances and 25 features. e shape of RMSE of BFGRT can be roughly divided into two segments with the number of cluster centers of 4000 as the cutting point. e part of less than 4000 is a descending curve and that of greater than 4000 is an ascending one. When the number of cluster centers gets 4021, the minimum value of RMSE of BFGRT is 0.7013, while SVR, RF, and LSTM just get 0.9129, 0.7292, and 0.9734 (that is, 23.18%, 3.83%, and 27.95% improvement, respectively). When the number of cluster centers is 1592, RMSE of BFGRT reach the maximum value of 0.7387, which is about 1.30% more than RF and 19.08% and 24.11% less than SVR and LSTM, respectively. e average RMSE of BFGRT is 0.7189, which is better than 21.25%, 1.42%, and 26.15% than SVR, RF, and LSTM, respectively. e performance on a noisy dataset is shown in Figure 4(b). e RMSE of BFGRT is lower than the other three algorithms. e RMSE of BFGRT has a minimum value of 0.9016, a maximum value of 0.9397, and an average value of 0.9221. By contrast, the RMSE of SVR, RF, and LSTM is 1.2143, 1.0297, and 1.2745, respectively. e mean value of the RMSE of BFGRT is about 24.07%, 10.45%, and 27.65% better than SVR, RF, and LSTM, respectively. e performance of the algorithms analyzed in the dataset containing noise is as follows: SVR, RF, LSTM, and BFGRT have increased by 33.02%, 41.21%, 30.93%, and 27.48%, respectively. It can be seen that BFGRT can be less sensitive to noise than the other three algorithms. e number of instances in dataset metro interstate traffic volume is more than 4 times that of datasets mentioned above. As illustrated in Figure 5(a), the maximum value of RMSE of BFGRT is 2259.1858, while that of SVR is 3050.0148 and that of RF is 2435.1646 (i.e., 25.93% and 7.23% improvement, respectively). Compared with LSTM, BFGRT decrease by about 2.27%. e minimum value of RMSE of BFGRT is 2154.1888, which performs 29.37%, 11.54%, and 2.49% better than SVR, RF, and LSTM, respectively. e mean value of RMSE of BFGRT is 2202.8738, which is 27.77%, 9.54%, and 0.28% better than SVR, RF, and LSTM, respectively. From the dataset containing 1% noise, as shown in Figure5(b), BFGRT improve the performance by about 29.08%, 11.20%, and 1.86% than SVR, RF, and LSTM, respectively. SVR, RF, LSTM, and BFGRT have increased by 2.04%, 2.07%, 2.10%, and 0.20% (the mean value of RMSE) on the noisy dataset, respectively. Compared with the other three algorithms, BFGRT is the algorithm least affected by noise.
e number of instances in dataset online shopping is more than 160000. As shown in Figure 6(a), the maximum value of RMSE of BFGRT is 1.88, while that of SVR is 2.12, that of RF is 2.03, and that of LSTM is 1.89 (i.e., 11.32%, 7.39%, and 0.53% improvement, respectively). From the dataset containing 1% noise, as shown in Figure 6(b), the maximum value of RMSE of BFGRT is 1.94, which has improved by about 16.01%, 9.77%, and 0.51%, respectively, compared with SVR, RF, and LSTM.
From the above analysis, it can be seen that BFGRT outperforms SVR, RF, and LSTM in the six datasets. In particular, it is stable on the three datasets with noise and is less disturbed by noise. Judging from the shape of the RMSE of BFGRT, it presents a form of low middle and high sides. When the number of cluster centers is close to the median of the number of instances, the performance can be optimal. For datasets that contain noise, we also found that BFGRT has the better robustness. e main reason is that BFGRT contains global comparison ideas in the fuzzy granulation process, which can overcome the noisy interference to some extent. is is also a great advantage of BFGRT designed.

Conclusion
In this study, we propose BFGRT suitable for the regression problem. In the algorithm, the idea of parallel fuzzy granulation is introduced to further improve the efficiency of data granulation. In the process of parallel fuzzy granulation, we design a clustering algorithm with automatic optimization of cluster centers. rough parallel fuzzy granulation, a regression problem can be solved in the fuzzy granular space where we present new operators and metrics between fuzzy granules. In the fuzzy granular space, we design a loss function to select optimal attribute as split point and construct recursively a fuzzy granular regression tree. Based on these, we build multiple fuzzy granular regression trees according to different attributes to form BFGRT to predict a test instance. In the future, BFGRT can be combined with cloud computing and thing of Internet to process the big data.

Data Availability
e dataset used to support the findings of this study is available from the UC Irvine Machine Learning Repository.

Conflicts of Interest
e authors declare that they have no conflicts of interest.