An Extended Clustering Membrane System Based on Particle Swarm Optimization and Cell-Like P System with Active Membranes

An extended clustering membrane system using a cell-like P system with active membranes based on particle swarm optimization (PSO), named PSO-CP, is designed, developed, implemented, and tested. ,e purpose of PSO-CP is to solve clustering problems. In PSO-CP, evolution rules based on the standard PSO mechanism are used to evolve the objects and communication rules are adopted to accelerate convergence and avoid prematurity. Subsystems of membranes are generated and dissolved by the membrane creation and dissolution rules, and a modified PSO mechanism is developed to help the objects escape from local optima. Under the control of the evolution-communication mechanism, the extendedmembrane system can effectively search for the optimal partitioning and improve the clustering performance with the help of the distributed parallel computing model. ,is extended clustering membrane system is compared with five existing PSO clustering approaches using ten benchmark clustering problems, and the computational results demonstrate the effectiveness of PSO-CP.


Introduction
e scopes and scales of datasets are growing exponentially with the advent of new sources of data generation. is growing tendency produces a serious challenge for discovering knowledge from data. Data clustering is one of the important techniques used in data mining [1]. It aims to put similar data points into the same group or cluster using the characteristics of the data without any prior knowledge about the groups or clusters. erefore, the implicit patterns or knowledge can be extracted through data clustering [2]. Traditional data clustering approaches can be categorized into partition clustering, hierarchical clustering, density clustering, and grid clustering [3]. ese clustering methods have low time complexity and are easy to implement, but also produce highly skewed dendrograms that may not reflect the true structures of the datasets [4,5]. erefore, some evolutionary approaches have been introduced to solve clustering problems in recent years [6], such as genetic algorithms [7], particle swarm optimization (PSO) [8], differential evolution (DE) [9], artificial bee colony (ABO) [10], and ant colony optimization (ACO) [11], among others. PSO is one of the global optimization techniques based on the intelligence strategy of the population, and many works have been done to use PSO to solve clustering problems.
Netjinda et al. [12] presented a PSO approach, named starling PSO (SPSO), which is inspired by the collective response behavior of starling birds. e collective information of the neighbors is used to replace the local information in history, and subpopulations are generated when the premature phenomenon appears. Song et al. [13] proposed an improved PSO procedure based on the features of the clustered data. An environment factor is added to the velocity adjustment in PSO to improve the global searching ability, which is represented by the cluster centers of the partitioning results. Liu et al. [14] developed a modified coevolutionary multiswarm optimizer based on a new velocity updating and similarity detection mechanism. Lassad et al. [15] designed a PSO procedure with new adaptive inertia weight and time acceleration coefficients for solving fuzzy clustering problems. Asgarali and Abdolreza [16] introduced a K-harmonic means clustering approach, which integrates the improved cuckoo search and PSO. Pereira de Gusmão and de Carvalho [17] proposed two hybrid clustering methods for multiview relational data, taking advantages of the global convergence ability of PSO and the local exploitation of hard clustering algorithms in the update of the position vectors in PSO. Manju and Kumar [18] developed a new sustainable clustering method based on PSO with a mutation operator for clustering of the data generated from different networks. Huang et al. [19] designed a memetic clustering approach based on PSO and the gravitational search algorithm using the hybrid operation and a diversity enhancement as the two main mechanisms. Zhang et al. [20] presented a new clustering approach based on PSO with a leader updating mechanism and ring topology for multimodal multiobjective optimization.
Although PSO has shown a great potential in solving clustering problems, it still has some limitations, such as being easily falling into local optima and exhibiting the premature phenomena. Furthermore, the computational complexity of the PSO clustering approaches may increase quickly as the number of data points in the dataset increases [21].
erefore, more studies are needed to improve the performance of PSO for clustering.
Membrane computing, also known as membrane systems or P systems, is a novel approach of bio-inspired computing initiated by Pȃun [22]. It seeks to discover novel biological computing models from the structure of biological cells as well as the cooperation of cells in tissues and organs. Parallel computation in membrane systems can avoid the increase in time consumption with the increase in the number of data points. erefore, membrane systems are suitable for solving clustering problems [23]. Research shows that some models of P systems present the same computing power as Turing machines and are more efficient to some extent [24]. Spiking neural P systems are a kind of neural-like P systems in membrane computing. It provides a class of parallel computing models [25]. Many variants of spiking neural P systems have been proposed [26,27] and have been applied to various real-world problems [28,29].
Xue and Liu [30] developed a new communication P system for solving clustering problems. Liu and Xue [31] proposed a new cluster splitting technique based on Hopfield neural networks and P systems. Liu et al. [32] presented an improved Apriori algorithm, named ECTPPT-Apriori, based on evolution-communication tissue-like P systems with promoters and inhibitors. Peng et al. [33] designed a tissue-like membrane system with a fully connected structure using an inherent mechanism to deal with automatic clustering problems. Peng et al. [34] developed an extended membrane system with active membranes, in which a modified differential evolution mechanism is used to find the optimal cluster centers in clustering problems. Peng et al. [35] introduced a multiobjective clustering framework using a tissue-like membrane system for fuzzy clustering problems. Wang et al. [36] proposed a new cell-like P clustering system using a modified genetic algorithm to evolve the objects and using communication rules in the cell-like P system to enhance the diversity of the populations. e traditional evolution mechanism is easily trapped into local optima, called the premature phenomenon, which is a main limitation of PSO for solving optimization problems. Many previous studies paid close attention to improving the global searching ability and avoiding prematurity. Membrane systems are distributed parallel computing models and can effectively avoid the prematurity and improve the global searching ability of PSO. Over the past years, a variety of membrane systems integrated with PSO have been proposed and proved powerful and efficient in solving optimization problems. Xiao et al. [37] proposed a hybrid membrane evolutionary algorithm, which combines a one-level membrane structure with a PSO local search algorithm. Xiao et al. [38] developed an improved dynamic membrane evolutionary algorithm based on PSO and DE to solve constrained engineering design problems. Singh and Deep [39] designed a new multiple-PSO based membrane algorithm with seven different membranes for solving reallife problems. Elkhani et al. [40] proposed a kernel P system and introduced multiobjective binary PSO to feature selection and classification methods with time efficiency on GPU. Furthermore, the inherent mechanism based on communication rules between different cells or membranes can accelerate convergence of PSO. erefore, membrane systems are used to enhance the clustering performance of PSO in this study. Each cell or membrane in a membrane system, as an independent computing unit, can be regarded as a subpopulation of particles in PSO, and the cooperation of subpopulations can be viewed as the communication between membranes [41].
is work focuses on the development of a membrane computing model, as an extended membrane system, to solve clustering problems and overcome the limitations mentioned above. A new clustering method based on membrane systems and the PSO mechanism is proposed.
is membrane system with active membranes has a dynamic membrane structure during evolution and computation. e velocity updating mechanism of PSO is used as the basic evolution rules for the objects in the elementary membranes. Another evolution rule based on the fitness Euclidean-distance ratio (FER) [42] method is introduced for the objects to escape the local optima in the membranes of the subsystems. e communication mechanism in membranes is adopted to transport the best objects in order to accelerate the convergence of the P system. is system is evaluated using 10 benchmark clustering problems to verify the validity and performance of the extended clustering membrane system. e rest of this paper is organized as follows. e clustering problems are described in Section 2. e framework of cell-like P systems with active membranes is given in Section 3. ese concepts are related to the development of the proposed extended clustering membrane system. Section 4 describes the details of the extended clustering membrane system based on cell-like P systems and the PSO mechanism. Experimental results on benchmark clustering problems are reported in Section 5. Section 6 provides conclusions and outlines future research directions.

Data Clustering
In this section, the basic concepts of data clustering problems are described in detail. Let X � x 1 , x 2 , . . . , x N be a dataset containing N unlabelled data points. Data point i, for i � 1, 2, . . . , N, is represented by x i � x i1 , x i2 , . . . , x i d with d representing the dimension of the data. e purpose of a clustering problem is to find a partition of the dataset with similar data points in the same cluster. e partition result is represented by C � c 1 , c 2 , . . . , c K , where K is the number of clusters and c k is cluster k, for k � 1, 2, . . . , K. e vector of the cluster centers is represented by Z � z 1 , z 2 , . . . , z K with z k representing the cluster center of c k [43].
A partition must satisfy some conditions, such as the data points in the same cluster should be similar as much as possible and the data points in different clusters should be different as much as possible. Usually, a clustering technique may search in the solution space to find the optimal cluster centers based on some clustering measures. A commonly used clustering measure, called the fitness function, is defined as follows: where ω ij is the associated weight for data point x i to belong to the cluster j. If the data point x i is allocated to cluster j, ω ij � 1; otherwise, ω ij � 0. A clustering process is to separate the data points into the corresponding clusters that can be viewed as an optimization problem, and the purpose of the optimization problem is to find a partition or a set of cluster centers to minimize the fitness function (1), i.e., In addition, the value of the fitness function f is used to evaluate the performance of clustering techniques and to compare the quality of objects or potential solutions. When two objects are compared, the one with a smaller value of the fitness function is better than the other.

Cell-Like P Systems
3.1. e Basic Cell-Like P Systems. Cell-like P systems are a class of membrane systems, which abstract computing models from cell structures and functions or from the group collaboration of cells. Research shows that the computation ability of simple cell-like P systems is equal to that of Turing machines [44]. e usual cell-like P systems have a tree membrane structure, that is, a simple graph. Each membrane contains a set of objects or symbols that can be evolved and communicated by evolution and communication rules. A basic cell-like P system can be expressed as the following tuple: where q ≥ 1 is the degree of the system; O is a finite set of alphabets, whose symbols are called objects, i.e., u and v represent different objects in the alphabets, where u, v ∈ O; H is a finite set of labels for the membranes; μ is the membrane structure consisting of q membranes and its regions are labelled by the elements of H; w 1 , . . . , w q are the multisets of objects placed in the regions of the membranes, with w i ∈ O, for 1 ≤ i ≤ q; R represents multiple but finite sets of evolution rules associated with the membranes; R ′ represents multiple but finite sets of communication rules between different membranes; and i 0 is the output region or output membrane in the P system [45]. A cell-like P system is a hierarchy of q membranes or cells where each membrane or cell may contain one or more other membranes or cells. A membrane or cell contains many objects in the system, and an object u represents a potential solution in the search space, where u ∈ O. A membrane is called an elementary membrane if it does not contain any other membranes. An elementary membrane has no children membranes in the system. A membrane is called a nonelementary membrane if it contains other membranes. A membrane is called a skin membrane if it is not contained in any other membranes. e skin membrane has no parent membrane in the system. e degree of the system is the number of elementary membranes in the system. A cell-like P system is mainly composed of three parts: membrane structure, objects, and rules. Figure 1(a) gives a graphical representation of a simple cell-like P system, in which membranes are labelled from 0 to 9. ese membranes are arranged by a hierarchical structure which can be represented as a tree diagram as shown in Figure 1(b).
In Figure 1(a), membrane 0 is the skin membrane which is not contained in any other membranes and membrane 1 is the parent membrane of membranes 4 and 5 and is a nonelementary membrane. Membrane 4 is an elementary membrane which does not contain any other membranes. e tree structure in Figure 1(b) is an abstract from the celllike P system in Figure 1(a). e nodes in the tree represent the membranes, the leaf nodes represent the elementary membranes, and the root node 0 represents the skin membrane 0. A node in a layer is the parent of the nodes following it in the next layer, and the nodes in the next layer are the children of the node in the layer above. e degree of this cell-like P system is 6. Specially, a membrane only communicates with its parent membrane and children membranes, if it has any, and there are not existing communication rules between the sibling membranes.
A cell-like P system usually has two types of rules: evolution rules R and communication rules R′. Evolution rules are of the form R � u ⟶ v { }, for u, v ∈ O, which means that a copy of object u will be evolved to object v.

Mathematical Problems in Engineering
for h ∈ H and u, v ∈ O, which means that a copy of object u will be changed to object v and transported into membrane h. e object is modified in the communication process, and membrane h is the parent or child membrane of the membrane where originally u was.

An Extended Cell-Like P System with Active Membranes.
e evolution and communication rules in the cell-like P systems only execute on objects, but not on membranes. e objects will be changed and moved based on the evolutioncommutation rules, but the membranes will not change during the evolution and computation. erefore, an extended cell-like P system with active membranes is introduced to overcome this restriction. is extended P system contains not only evolution and communication rules for objects but also evolution rules for membranes.
ere are two types of membrane evolution rules: creation rules and dissolution rules. Membrane creation rules are of the form for h, h 1 , . . . , h sn ∈ H and u, v ∈ O, which means that membrane h 1 to h sn are created and a copy of object u in membrane h is evolved to v and transported to these newly created membranes, where sn is the number of newly created membranes. Membrane dissolution rules are of the form [u] h ⟶ λ, for u, λ ∈ O, which means that the membrane will be dissolved, and object u in membrane h will disappear, where λ is a special symbol that represents no objects in the membrane. erefore, the extended cell-like P system with active membranes has a dynamic membrane structure in the evolution and computation process [46].

An Extended Clustering Membrane System
An extended membrane system with active membranes, called the particle swarm optimization cell-like P system (PSO-CP), is introduced to solve clustering problems. is system has two main mechanisms: the evolution-communication mechanism for objects and the evolution mechanism for membranes. More details about PSO-CP are given in the following.

Initialization
e Initial Membrane Structure. e membrane structure of PSO-CP is built dynamically through membrane creation and dissolution rules during the evolution and computation process. erefore, the number of membranes will change. Specifically, PSO-CP starts with an initial membrane structure and then the membrane evolution mechanism will control the structure and the number of membranes. e initial membrane structure of a PSO-CP is graphically depicted in Figure 2.
In Figure 2, the PSO-CP with a three-layer nesting structure contains a skin membrane 0, a nonelementary membrane 1, and elementary membranes 2 to q. ese membranes are labelled from 0 to q. e nonelementary membrane, also called the comparison membrane, is labelled 1, whose role is to find the best object in the elementary membranes during the current evolution process and output the best object to the outmost membrane. Membrane 1 is the parent of the elementary membranes 2 to q, and elementary membranes 2 to q are the children of nonelementary membrane 1. e outmost membrane, labelled 0, is the skin membrane, whose role is to store the best object in the system during the current evolution and computation process.

Object Representation.
In the PSO-CP, an object is a set of clustering centers representing a feasible solution. erefore, the objects represent the sets of clustering results. Each object u, u ∈ O, is designed as a composite K × d dimensional vector [47] of the following form: where z k � (z k1 , z k2 , . . . , z k d ) corresponds to the cluster center of cluster k and d, as mentioned earlier, represents the dimension of the data points. Hence, an object u, with u � z 1 , z 2 , . . . , z K , represents a set of cluster centers. As usual, each membrane has at least one object. To ensure the same computation complexity of each elementary After initialization, the fitness value of each object is calculated using (1). In membrane o, for o � 2, 3, . . . , q, the initial best with the lowest fitness value for u p , denoted by u lbest p (0) and called the local best, is determined for p � 1, 2, . . . , m. e initial best with the lowest fitness value in history among all objects in membrane o, denoted by u gbest o (0) and called the global best, is also determined. e local best and the global best refer to the positions of the objects. e object that found the global best is the global best object.

Evolution Rules for the Objects.
e PSO-CP has two types of evolution rules: the basic evolution and the local evolution rules.
(1) e Basic Evolution Rules. In this work, the standard PSO [48] is used to search for the optimal solutions in the elementary membranes 2 to q. e evolution of objects is achieved only within elementary membranes, and the basic evolution rules only execute on objects contained in elementary membranes 2 to q. Let u p (t) and V p (t) represent the position and velocity of u p at time t in elementary membrane o, for o � 2, 3, . . . , q. e velocity of u p at time t + 1 is determined by using the following equation: where w is the inertia weight, c 1 and c 2 represent the local and global learning factors, which control the influence of the local best and the global best object, t is the time or iteration counter, and r 1 and r 2 are two uniform random numbers [49]. e local best of u p is denoted by u lbest p (t) and the global best in the elementary membrane o is denoted by u gbest o (t) at iteration t. For notational convenience, the global best object is also denoted by u gbest o (t) at iteration t. e position of u p at time t + 1 is determined by using the following equation: e local best of u p at time t + 1 is updated according to the following equation: e global best in membrane o at iteration t + 1 is updated according to the following equation: e inertia weight is updated dynamically to enhance the global searching ability of the objects and to avoid prematurity. A linear increasing approach, given by (10), is used to update the inertia weight: where w min and w max represent the minimum and maximum of the inertia weight and t max is the maximum number of iterations.
(2) e Local Evolution Rules. A local search strategy based on the FER is adopted in local evolution rules for the objects. e approaches of updating the velocity and position of an object are modified. e local evolution of the objects is achieved only within a subsystem of membranes to help objects escape from local optima, and the local evolution rules only execute on objects contained in the membranes of the subsystem. e modified velocity is determined by using the following equation: and the modified position is determined by using the following equation: Mathematical Problems in Engineering where χ is a constriction coefficient used to prevent an object from evolving too far away from the search space and is given by χ � 2/|2 − φ max − ���������� � φ 2 max − 4φ max | and r 3 and r 4 are two random numbers uniformly distributed between [0, φ max /2]. In the above, φ max is a positive constant with φ max � 4.1 [42]. In (11), u nbest p (t) represents the best neighbor of u p in the neighborhood, where the neighborhood is a subset of objects in the current evolution process. e objects in a membrane of the subsystem are listed in a decreasing order of the FER values. e value of the FER for a given u p and any other u p′ , for p ′ ≠ p and p ′ � 1, 2, . . . , m, is determined by using the following equation: where ‖u lbest p (t) − u lbest p′ (t)‖ is the Euclidean distance between u lbest p (t) and u lbest p′ (t) at the current iteration t. Apparently, u p and u p′ only exist in the same membrane o. In (13), α represents a control parameter represents the worst objects with the highest value of the fitness function in membrane o, u gbest o (t) represents the best object in the same membrane o, and ‖D‖ is the size of the data space

e Communication Rules.
e communication rules in the PSO-CP realize the exchange and sharing of the global best objects among elementary membranes 2 to q and nonelementary membrane 1 as well as between nonelementary membrane 1 and skin membrane 0.
is communication only exists between children membranes and the parent membrane. In order to enhance the global searching ability of the objects, the PSO-CP has two types of communication rules, i.e., the communication rules in the elementary membranes and the communication rules in the nonelementary membranes.
and is sent to nonelementary membrane 1 at iteration t. Note that the global best object u gbest o (t) still stays in elementary membrane o. erefore, nonelementary membrane 1 only contains q − 1 objects from the q − 1 elementary membranes at iteration t. Nonelementary membrane 1 will select the best object among these local best objects from the elementary membranes as the global best object u gbest 1 (t).
A copy of the local best object u lbest p o (t) in nonelementary membrane 1 is evolved to be the local best u lbest r (t) of u r and is sent to elementary membrane o + 1 at iteration t. e selection of u r is based on a random strategy in membrane o + 1. Specially, , a copy of the local best object u lbest p q (t) is evolved to be the local best u lbest r (t) of u r in elementary membrane 2, while the selection of u r is based on a random strategy. At the same time, membrane 1 will transport a copy of the global best object u gbest 1 (t) to the skin membrane 0 and evolve it to the local best object u lbest 0 (t), i.e., . If the local best object u lbest 0 (t) at iteration t is better than the global best object u gbest 0 (t + 1) at iteration t − 1 in the skin membrane 0, the local best object u lbest 0 (t) will become the global best object u gbest 0 (t) at iteration t. us, the skin membrane 0 always holds the global best object of the whole membrane system at iteration t and will output it to the environment at the end of the evolution and computation process. In addition, the local best object u lbest 0 (0) from nonelementary membrane 1 will be placed into the skin membrane 0 as the best object u gbest 0 (0) at the beginning of the process.

Creation and Dissolution Rules for Membranes.
Membrane rules form the evolution mechanism of membrane systems, which are different from the traditional evolution and communication rules of the objects. e PSO-CP uses two types of membrane rules: membrane creation and dissolution rules [50]. e membrane rules only change the elementary membranes, while the nonelementary and skin membranes always stay the same in the evolution and computation process. in elementary membrane o and retain the best membrane containing the global best object to continue the following evolution and computation process. Figure 4 shows an example of membrane creation. At iteration t, a new subsystem of elementary membrane 3 is created when the global best object cannot be further improved for limit iterations, and this subsystem consists of sn membranes. All the objects in membrane 3 are copied to the membranes of this subsystem. e local evolution rules for objects are used to evolve the objects in each membrane in the subsystem. At the same time, elementary membrane 3 continues to evolve. After one evolution, each membrane in the subsystem will transport a copy of the global best object u gbest 3 g (t), for g � 1, 2, · · · , sn, to nonelementary membrane 1. Nonelementary membrane 1 selects the global best object v gbest 3 (t + 1) among the global best objects u gbest 3 g (t + 1) in the membranes of the subsystem and the global best object u gbest 3 (t + 1) in elementary membrane 3. e membrane containing the global best object v gbest 3 (t + 1) will be kept and will be used to replace elementary membrane 3.

Membrane Dissolution Rules.
During the evolution and computation process of the PSO-CP, the membranes will be dissolved through the membrane dissolution rules, but dissolution happens only on elementary membranes and the ones in their corresponding subsystems. e PSO-CP destroys a subsystem by the membrane dissolution rules when the best object in a subsystem or in the corresponding elementary membrane is found after one evolution. e membrane dissolution rules are of the form [u] ⟶ λ.
Membrane dissolution happens in membrane o g of the subsystem and the corresponding elementary membrane o, for g � 1, 2, . . . , sn and o � 2, 3, . . . , q. After one evolution, each membrane in the subsystem will transport a copy of the global best object u  Figure 5 shows an example of membrane dissolution. At iteration t, the subsystem of elementary membrane 3 is generated, which contains sn membranes. After one evolution, each membrane will transport a copy of the global best object u gbest 3 g (t + 1), for g � 1, 2, . . . , sn, to nonelementary membrane 1. Nonelementary membrane 1 selects the global best object v gbest 3 (t + 1) among the global best objects u gbest 3 g (t + 1) in the membranes of the subsystem and the global best object u gbest 3 (t + 1) in elementary membrane 3. Assume that the global best object v gbest 3 (t + 1) comes from membrane 1 in the subsystem. erefore, membrane 1 in the subsystem replaces elementary membrane 3 and elementary membrane 3 and all other membranes 3 g′ (g ′ ≠ 1) in the subsystem are dissolved. After replacing elementary membrane 3, membrane 1 in the subsystem is the new elementary membrane 3 which will continue to perform the subsequent evolutions and computations.

Halting and Output.
e PSO-CP is a parallel computing system, all elementary membranes and their corresponding subsystems work in parallel, and each membrane including elementary membranes and membranes in their subsystems are parallel computing units. e extended clustering membrane system starts running from the initial membrane structure with q − 1 elementary membranes containing the initial objects. Each object represents a set of cluster centers. ese objects will be evolved based on the basic evolution rules. e elementary membranes and nonelementary membranes interact through the communication rules, and the global best object in nonelementary membrane 1 will be transported to the skin membrane 0. A subsystem will be generated based on the membrane creation rules when an elementary membrane is trapped into a local optimum, and the local evolution rules are used to evolve the objects in membranes of the subsystem. After one evolution, the membrane with the best object among the membranes in the subsystem and the corresponding Mathematical Problems in Engineering elementary membrane will be kept and others will be dissolved based on the membrane dissolution rules. e evolution and communication rules for objects and the creation and dissolution rules for membranes will execute iteratively during the evolution and computation process. ese computing tasks are performed iteratively. e extended clustering membrane system will continue to execute until the halting condition is satisfied, which is the maximum number of iterations has been reached. When the system halts, the last global best object stored in skin membrane 0 is output to the environment and the set of cluster centers of the clustering problem is regarded as the final computed result.

Complexity Analysis.
In this subsection, the complexity of the PSO-CP is analyzed. As defined earlier, N is the number of data points in the datasets, m represents the number of objects in an elementary membrane, q represents the number of elementary membranes in the system, and t max represents the maximum number of iterations. In the initialization process, the local best and global best in each of the elementary membranes 2 to q need to be found in maximal parallel. e complexity of initialization is then O(m). e basic evolution rules in the elementary membranes are executed in parallel. e time needed by executing a basic evolution rule for an object is N. Hence

Experimental Results and Analysis
Computational experiments are conducted to evaluate the effectiveness of the PSO-CP. e datasets used in this study are introduced first. Four artificial datasets [51] are then used to tune the parameters in the PSO-CP. e clustering performance of the PSO-CP is compared with those of currently existing approaches using ten test datasets [52]. All clustering methods, including the PSO-CP, are implemented using MATLAB 2016b, and all the experiments are conducted on a Dell desktop computer with an Intel 4.00 GHz i7-8550U processor and 8 GB of RAM in the Windows 10 environment.

Datasets.
Four artificial datasets and ten test datasets are used in the experiments. e four artificial datasets are used to tune the parameters of the PSO-CP, and the ten test datasets are used to test the clustering performance of the PSO-CP as compared with those of five other clustering approaches. e ten test datasets including three artificial datasets and seven real-life datasets have been used by researchers as benchmarks to test their clustering approaches. ese datasets are briefly described below. e seven artificial datasets, Data_5_2, Size_5, Square4, LineBlobs, Data_4_3, Data_9_2, and Square1, are manually generated and have been used in the existing literature. e seven real-life datasets, Iris, Newthyroid, Seeds, Yeast, Glass, Wine, and Lung Cancer, are from the UCI Machine Learning Repository. More details about these datasets are presented in Tables 1 and 2. e Lung Cancer dataset is high dimensional. It contains 32 data points, has 56 independent features, and describes three types of lung cancers. e Yeast dataset consists of 1484 data points and has 8 features.
e Iris and Glass datasets are not linearly separable in the Euclidean space.
ese datasets with different characteristics in shape, size, compactness, and symmetry are used to evaluate the performance of the PSO-CP quantitatively.

Parameter Settings.
e number of membranes in the subsystems and the number of elementary membranes play important roles in the performance of the PSO-CP. e values of these parameters have critical influences on the performance of the PSO-CP in solving clustering problems.
is section focuses on checking the influences of two parameters, i.e., the number of elementary membranes q − 1 and the number of membranes sn in the subsystems, with four artificial datasets.

Number of Elementary Membranes.
In order to evaluate the effects of the number of elementary membranes on clustering performance, the PSO-CP with two different degrees, i.e., 2 and 5, are used to find the optimal clustering centers for these datasets, using the fitness function to measure clustering quality [34].
e PSO-CP with each degree ran 30 times on each dataset and the mean (Mean) and standard deviation (S.D.) of the fitness values are  Table 3. Table 3 gives the results of the PSO-CP with different degrees on the four artificial datasets.
e best values of Mean and S.D. for each dataset are highlighted. e results show that the PSO-CP obtained smaller Mean and smaller S.D. when q − 1 � 5 elementary membranes are used. ese results also show the PSO-CP with q − 1 � 5 has better robustness. Figure 6 gives more details about the clustering performance of the PSO-CP on two datasets, i.e., Data_5_2 and Size_5. e clustering performance of the PSO-CP may remarkably improve with the increase in the number of elementary membranes.

Number of Membranes in the Subsystem.
e number of membranes in the subsystem also has an important influence on the performance of the PSO-CP. In order to tune the number of membranes, the clustering performances of the PSO-CP with sn � 6, 10, and 14 are examined [12]. e parameters of the PSO-CP which are not tested in this experiment are kept at the same values. e values of Mean and S.D. of the fitness function are presented in Table 4.  Table 4. e PSO-CP shows best performance on these datasets when sn � 10. Figure 7 gives the boxplots of the values of the fitness function for each number of membranes in the subsystem for two datasets Data_5_2 and Size_5.
Considering results for all four datasets, the PSO-CP with sn � 10 has the best overall performance.

Comparison with Other Clustering Approaches.
To evaluate the effectiveness of the PSO-CP, its performance is compared with those of the standard particle swarm optimization (PSO) [49], fitness-distance ratio-particle swarm optimization (FDR-PSO) [53], fitness-Euclidean ratio-particle swarm optimization (FER-PSO) [42], starling particle swarm optimization (SPSO) [12], and environment particle swarm optimization (EPSO) [13] on the ten test datasets. Although there are many other advanced PSO clustering approaches, the ones used for comparison are the major references for the development of, and are more relevant to, the PSO-CP. e neighbor of the particle with the best FDR or FER value is selected in the FDR-PSO and FER-PSO approaches to replace the local best for velocity updating. A mechanism is used to lead the search out of a local optimum when stagnation occurs in the SPSO approach, and the implementation of the environment factor in the EPSO approach is similar to the active membranes in this study. Table 5 reports the parameter values of all the comparative clustering approaches used in the experiments. e information of neighbors is used to guide the search direction of the particle, and the number of neighbors has important influence on the performance of the SPSO approach. e SPSO approach copies the current population to multipopulations when the cumulative number of iterations reaches a previously set threshold, called stagnant limit meaning that the particle in the population is trapped into a local optimum. e number of subpopulations is a decisive factor of the convergent rate in the approach. is operation of population is similar to the creation and dissolution of membranes in the PSO-CP. e velocity control of FER-PSO is an adjustable parameter to balance the previous velocity and the current velocity of the particle, and the learning factor controls the influence of the environmental factor in EPSO. e values of all these adjustable parameters in these   approaches are the best values reported in the respective publications. Each clustering approach, including the PSO-CP, ran 50 times for each dataset. Simple statistics including the worst value (Worst), the best value (Best), the Mean, and the S.D. of the fitness function are used as the evaluation criteria. e experimental environment is the same for all these comparative clustering approaches. Figure 8 shows the convergence of these clustering approaches on the ten test datasets for typical runs of these approaches. e fitness value obtained by the PSO-CP declines faster at the beginning of the evolution process and then obtains fine convergence for each dataset. e values of the fitness function of PSO, FDR-PSO, and FER-PSO decrease slowly at the beginning of the evolution process and do not apparently have better convergence performance      than other approaches. Because the neighbor with the best FDR and FER values is used to replace the local best of the particle, the convergence of FDR-PSO and FER-PSO is not as fast as that of PSO. Although SPSO and EPSO show better performance than the above clustering approaches, they are also easily trapped into local optima, as shown in parts (b), (g), (h), and (j) of Figure 8. erefore, the PSO-CP has better convergence speed and higher clustering quality than the comparative approaches for all these datasets, as shown in Figure 8. Simple statistics of the fitness function values of these clustering approaches on these datasets are reported in Table 6. Results in Table 6 show that the PSO-CP has the overall best performance on these ten test datasets. Because of the characteristics of the test datasets, some clustering approaches performed better on some specific datasets with smaller mean values, but the performance of the PSO-CP on these test datasets is considered comparable. Compared with other clustering approaches, the PSO-CP is more robust with smaller values of S.D. of the fitness function values, and its performance is more stable than PSO through the use of the extended cell-like P system.
To investigate the performance of the PSO-CP, the average values of the fitness function are compared with those of the other clustering approaches and the Friedman test is used in the comparison. e null hypothesis is that all the clustering approaches in this experiment have the same performance for any one dataset. Mathematically, the Friedman test works as follows [54]. In the Friedman test, the ten test datasets are treated as a random sample and each clustering approach is considered as a treatment. e average fitness values of the clustering approaches on each dataset are ranked from the largest to the smallest [55]. e rank of clustering approach j on clustering problem i is denoted by r ij . e mean of these ranks is (1/2)(p + 1), in this case 3.5, where p is the number of treatments. e Friedman test statistic χ 2 r is given in the following form: where n is the number of rows, i.e., datasets, 10 in this case. e Friedman test statistic follows a chi-squared distribution with p − 1 degrees of freedom. e ranks of the fitness values obtained by the clustering approaches for each dataset are presented in Table 7. e PSO-CP is ranked highest, i.e., nine out of the ten test datasets. e Friedman test statistic χ 2 r is computed using Table 7, and the result is χ 2 r � 46.2286. With p − 1 � 5 degrees of freedom, the critical value is χ 2 � 11.07 at the significance level α � 0.05. Hence, the conclusion of the Friedman test is to reject the null hypothesis, i.e., the treatment levels are significantly different. In this case, the different clustering approaches obtained significantly different fitness function values.
In clustering problems, the F-measure is sometimes used to measure the quality of clustering [35]. Each data point in a dataset belongs to a specific class, i.e., has a specific label, in reality although the label is usually unknown for clustering problems. Let b l represent class l in reality and c k represent cluster k obtained by a clustering approach, for l, k � 1, 2, · · · , K. e number of data points belonging to b l is denoted by |b l |, the number of data points belonging to c k is denoted by |c k |, and the number of the data points belonging to both b l and c k is denoted by |b l ∩ c k |. e precision of class l and cluster k is defined as follows: e recall of class l and cluster k is defined as follows: e F-measure of class l and cluster k is given in the following form: e clustering results of a good clustering approach should be close to the actual classes in the dataset [56]. e overall F-measure of the clustering results is given in the following form: When clustering approaches are compared, the approach with a larger value of the F-measure is more effective. Table 8 provides the Mean and S.D. of the values of the F-measure of the comparative clustering approaches on the ten test datasets. As the results in Table 8 show, the PSO-CP clearly has the best overall performance according to the F-measure among all these clustering approaches. e classification of a data point is correct or accurate if it is clustered into the right class or cluster [56]. erefore, the classification rate represented by A, also called clustering accuracy, for the test datasets is also used to evaluate the performance of the clustering approaches. e classification rate of a dataset is defined as the proportion of correctly classified data points in a dataset, as shown in the following equation: where E is the number of correctly classified data points. Table 9 provides the classification rates of the comparative clustering approaches on the ten test datasets. Although the PSO-CP obtained a classification rate lower than that of FER-PSO on the Square1 dataset, it obtained better classification rates on all other test datasets than the other clustering approaches. Overall, the PSO-CP has the highest means of classification rates among the six comparative clustering techniques. e Friedman test is also applied to the Means of the classification rates. e computed Friedman test statistic is χ 2 r � 50.4. With p − 1 � 5 degrees of freedom, the critical value is χ 2 � 11.07 at the significance level α � 0.05. erefore, these clustering approaches obtained significantly different classification rates.
Compared with other improved PSO procedures, the PSO-CP has better values of the F-measure and better classification rates. erefore, the extended cell-like P system helps the PSO-CP improve its clustering performance, and the introduction of membrane systems gives a new way for PSO to solve clustering problems.

Conclusions
An extended membrane system combining an extended celllike P system and the PSO mechanism, called the PSO-CP, is developed to solve clustering problems. is extended P system under the framework of membrane computing, using a cell-like P system with active membranes, integrates the rules for objects and membranes. Different from the existing evolutionary clustering techniques, the PSO-CP uses basic evolution rules for objects based on the standard velocity and position updating rules of the particles in PSO and the communication rules for objects to transfer the best objects between membranes. Subsystems containing membranes are specially designed to avoid prematurity, and a modified evolution mechanism for objects is  used to evolve objects in the subsystems.
e PSO-CP is evaluated on ten test datasets from the Artificial Datasets and the UCI Machine Learning Repository, and the computational results clearly exhibit the effectiveness of this proposed membrane system in solving clustering problems as compared with five existing PSO clustering methods. P systems, as parallel computing models, are highly effective and efficient in solving optimization problems with linear or polynomial complexity. ese parallel computing models based on evolution mechanisms provide new ways for solving clustering problems. e extended membrane system uses the cell-like P system as the computation structure, and the communication rules between membranes are single directional. Although these single directional communication rules are simple and easy to implement, bidirectional communication rules may be introduced in future studies to further accelerate the convergence and improve the diversity of populations. Some more complicated communication structures between different membranes may be used in future studies to improve the performance of the approach. Furthermore, the experiments only used small datasets from the Artificial Datasets and the UCI Machine Learning Repository, and the proposed approach may have some limitations on high dimensional and large datasets. Future studies may test the effectiveness of the PSO-CP using large datasets. Balancing the local and global search abilities is also a hard problem to resolve in future studies. Future studies may also focus on extended membrane systems based on tissue-like P systems and other bio-inspired computing models. More studies are needed to apply these extended membrane systems to solve automatic and multiobjective clustering problems.
Data Availability e seven artificial datasets manually generated and often used in the existing literature are from the Artificial Datasets, available at https://www.isical.ac.in/content/databases (accessed June 2018). e seven real-life datasets are from the UCI Machine Learning Repository, available at http:// archive.ics.uci.edu/ml/datasets.html (accessed June 2019).

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.