A Decomposition Method for Both Additively and Nonadditively Separable Problems

Problem decomposition is crucial for coping with large-scale global optimization problems, which relies heavily on highly precise variable grouping methods. The state-of-the-art decomposition methods identify separability based on the finite differences principle, which is valid only for additively separable functions but not applicable to nonadditively separable functions. Therefore, we need to investigate separability in more depth in order to propose a more general principle and design more universal decomposition methods. In this article, we conduct a comprehensive theoretical investigation on separability, the core of which is proposing an innovative separability identification principle: the minimum points shift principle. By utilizing the new principle, we develop a general separability grouping (GSG) method that can handle both additively and nonadditively separable functions with high accuracy. In addition, we design a new set of benchmark functions based on nonadditive separability, which compensates for the lack of nonadditively separable functions in the previous test suites. Extensive experiments demonstrate that the proposed GSG achieves high grouping accuracy on both new and CEC series benchmark problems, especially on nonadditively separable problems Finally, we verify that the proposed GSG can effectively improve the optimization performance of nonadditively separable problems through optimization experiments.


A Decomposition Method for Both Additively and Nonadditively Separable Problems
Minyang Chen , Wei Du , Member, IEEE, Yang Tang , Senior Member, IEEE, Yaochu Jin , Fellow, IEEE, and Gary G. Yen , Fellow, IEEE Abstract-Problem decomposition is crucial for coping with large-scale global optimization problems, which relies heavily on highly precise variable grouping methods.The state-of-theart decomposition methods identify separability based on the finite differences principle, which is valid only for additively separable functions but not applicable to nonadditively separable functions.Therefore, we need to investigate separability in more depth in order to propose a more general principle and design more universal decomposition methods.In this article, we conduct a comprehensive theoretical investigation on separability, the core of which is proposing an innovative separability identification principle: the minimum points shift principle.By utilizing the new principle, we develop a general separability grouping (GSG) method that can handle both additively and nonadditively separable functions with high accuracy.In addition, we design a new set of benchmark functions based on nonadditive separability, which compensates for the lack of nonadditively separable functions in the previous test suites.Extensive experiments demonstrate that the proposed GSG achieves high grouping accuracy on both new and CEC series benchmark problems, especially on nonadditively separable problems Finally, we verify that the proposed GSG can effectively improve the optimization performance of nonadditively separable problems through optimization experiments.

I. INTRODUCTION
T HE TERM large-scale global optimization (LSGO) typi- cally refers to solving optimization problems that involve more than thousands of decision variables by heuristics and meta-heuristics [1].In real-world applications, researchers are increasingly likely to encounter such LSGO problems.Due to the exponential expansion of the search space, the complexity of optimization problems increases significantly, and the computational resources required to solve them grow dramatically.This leads to the so-called "curse of dimensionality," i.e., the performance of the algorithms decreases rapidly with the increase of the dimensionality of the problem.
Evolutionary algorithms (EAs) [2] are promising to tackle complex and black-box LSGO problems because they have excellent global optimization capability and only require implicit objective functions.However, when it comes to LSGO problems and the curse of dimensionality, EAs are still not effective enough to deal with them.Designing specific frameworks or improving existing EAs, which can reduce difficulties that high dimensionality brings, has become a hot topic in the research community.In the simplest division, decomposition-based methods and nondecompositionbased methods are the two most prevalent approaches in LSGO [3].Decomposition-based methods are known as the "divide and conquer" approaches, which decompose a given high-dimensional problem into a set of lower-dimensional ones and deal with them separately in the cooperative co-evolution (CC) framework.Representative algorithms include differential evolution with CC and grouping (DECC-G) [4], and CC with global differential grouping (DG) and covariance matrix adaptation evolution strategy (CC-GDG-CMAES) [5].By contrast, nondecomposition-based methods optimize all decision variables integrally by memetic algorithms, hybrid algorithms, or enhanced optimizers, e.g., self-adaptive differential evolution with multitrajectory search (SaDE-MMTS) [6], multiple offspring sampling (MOS)-based hybrid algorithm [7], and competitive swarm optimizer (CSO) [8].
Decomposition-based methods are the current mainstream.They are clearly structured along with a complete theoretical foundation.Decomposition-based methods consist of two steps: 1) variable grouping and 2) cooperative optimization.Variable grouping refers to assigning certain variables together to form multiple groups, which plays a dominant role with regard to an algorithm's overall performance.The basis for variable grouping is separability [9]-an inherent property that describes functions whose variables can be separately optimized to the global optimum.Therefore, identifying the separability among variables becomes one of the critical issues in the field of LSGO.
Over the past decade, the methods of variable grouping have been greatly developed along with the improvement of the separability identification principle.In earlier studies, researchers did not find a way to detect the separability between variables in continuous optimization.Early decomposition methods tended to divide variables directly or randomly, e.g., n 1-D grouping [10], [11], half-grouping [12], and fixed-size k s-dimensional grouping [13].Then, the dynamic grouping strategy is adopted [4], [14].Variables are rearranged and regrouped under each CC cycle so that the probability of grouping nonseparable variables together increases.However, the group size and group number are predetermined and fixed.To accommodate problems with different levels of separability, researchers have also developed some methods to adaptively adjust the group size [15], [16].However, these methods are valid for fully separable problems, but cannot handle problems containing nonseparable variables.This has in turn made CCEAs unable to obtain a global optimum in the CC phase.
To identify separability accurately, researchers have attempted to find some clues from functional landscape, differences, and the information generated during optimization.The most common separability identification principles can be divided into three categories: 1) monotonicity detection [17]; 2) finite differences [18]; and 3) others.
1) Monotonicity Detection: This principle identifies separability according to the functional landscape.The principle judges two variables (sets) as nonseparable if distortions are detected in the functions' landscape, i.e., when the fitness relationship of two pairs of detection points changes [17], [19].Researchers usually name decomposition methods employing this principle as variable interaction learning (VIL).It is reported in [18] that VIL is a successful exploration of separability, even if the decomposition accuracy is not good enough, and it costs too much computational resources.
In [20], by simultaneously detecting the interaction between two sets of variables, the consumption of computational resources is greatly reduced.Moreover, the method proposed in [21] combined with a marginalizeddenoising model makes the computational cost of VIL further reduced.It is noteworthy that theoretical proofs of convergence show that if the problem decomposition is correct given by VIL, each subcomponent can be optimized to its global optimum [21], [22].However, there is no theory to show the necessary and sufficient relationship between monotonicity and separability.The monotonicity detection principle is characterized by its adaptability to various types of separability landscapes, but its major problem is the misclassification of nonseparability, i.e., it does not ensure that nonseparability can be identified every time, so VIL requires multiple changes of detection position to be effective, leading to large consumption of computational resources and low decomposition accuracy.
2) Finite Differences: This principle is based on the properties of additive separability.If two variables x i and x j are additively separable, their mixed partial derivative is 0, i.e., (∂f /∂x i ∂x j ) = 0, which results in the same finite difference of f (x) when perturbing x i under different x j .DG [18] first adopts the finite differences principle.Since then, a few methods have been proposed based on DG.Their improvements mainly focus on decomposition accuracy and computational resource consumption.Some improved DG makes progress for the rounding errors that arise during the computation, enhancing their decomposition accuracy [5], [23], [24], [25].Methods in [5], [23], [26], and [27] make improvements for indirect interaction and overlapping problems.In addition to raising the accuracy rate, some other methods concentrate on reducing the computational resource consumption.The idea of set-set detection was applied in [28] and [29], which significantly speeds up the decomposition process.In [30] and [31], thorough algorithmic complexity analyses has been provided.It can be found from the above works that finite-differencesbased decomposition methods have become the most effective and advanced problem decomposition methods.Compared with monotonicity detection, it only takes one time to identify the interaction between variables by finite differences detection, making the decomposition methods more efficient and accurate.In terms of the theoretical guarantee, the finite differences principle was first proved to be derived from additive separability in [18].Next, Sun et al. [26] stated that the equal difference arises from the second-order partial derivative between the two variables being 0 in additively separable problems.Finally, the finite differences principle on setset detection was proved by line integral [29] and Taylor expansion [32].Finite differences improve the decomposition accuracy significantly.It substantially reduces the computational resource consumption of grouping methods.Nevertheless, this principle is obtained from additive separability and therefore fails for functions that are separable but not additively separable.Such functions that are separable but cannot be expressed in additive sums are called nonadditively separable functions [33] (e.g., Ackley's function).This principle will mistakenly identify nonadditive separability as nonseparability.3) Others: This type of method utilizes information from the optimization process to infer the separability among variables.This information includes optimization intervals of variables [34], correlation of variables in the population (Pearson correlation coefficient [35] and mutual information [36], [37]), contribution [38], [39], etc.This information implicitly exhibits certain separability features, although they are not accurate enough.
The advantage of such methods is that they do not require additional computational resources to decompose the problem but exploit the information emerging from the optimization process.However, the drawbacks are also obvious: inaccurate, uncritical, and without theoretical support.
In general, the finite differences principle is the most effective and accurate separability identification principle currently.This principle is significantly better than the other two types of principles in terms of the grouping accuracy and computational resource consumption, so it is adopted by the state-of-theart grouping methods.However, its deficiency in nonadditive separability makes the DG-series grouping methods unable to handle nonadditively separable functions.
To address the lack of research on nonadditive separability and the failure of the state-of-the-art grouping methods for nonadditively separable functions, we conduct a comprehensive theoretical investigation that extends the study from the existing additive separability to general separability (i.e., including both additive and nonadditive separability).The core of the theoretical research is the minimum points shift principle, which can effectively identify general separability.Based on the proposed principle, a novel grouping method is developed, called general separability grouping (GSG).We provide rigorous mathematical proofs for the proposed theories and verify the performance of GSG with extensive simulations.Our contributions can be summarized as follows.
1) Comprehensive theoretical analyses on separability are given, which forms a complete theoretical system, including five new definitions, two newly found forms of nonadditive separability, and two newly proposed separability identification theorems.The two proposed theorems are collectively called the minimum points shift principle.2) A novel variable grouping method called GSG is devised based on the new principle.GSG can efficiently identify general separability and nonseparability, then decompose the problem with high precision.3) A nonadditive separability benchmark (including 12 test problems) is designed according to the newly found nonadditive separability forms.The proposed GSG is then compared with the state-of-the-art decomposition methods on the proposed and CEC series benchmarks.Extensive experiments demonstrate that GSG can decompose the problem accurately, leading to excellent optimization results by the CC optimizer.The remainder of this article is arranged as follows.We first provide our extended theories on separability in Section II.Section III presents the proposed grouping method, GSG.Section IV shows a new designed benchmark based on nonadditive separability.Then, the experimental results are shown in Section V along with some relevant discussions.Finally, the conclusion and future work are given in Section VI.

II. NEW SEPARABILITY THEORIES
Separability is the basis of variable grouping.Before designing the problem decomposition method, it is necessary to make systematic theoretical analyses of separability.In this section, we extend the theories of additive separability to general separability.These theories do not assume the differentiability of functions, so they can be applied to any type of functions.

A. Basic Definitions
Two basic definitions of separable functions are clearly presented below.The entire theoretical investigation is based on these two basic definitions.
1) Separability: Separability is an inherent property of a function.This property shows that the global optimum can be obtained by separately optimizing each variable group in turn while fixing the others.
Definition 1 (Separability [9], [40]): If the variables of an n-dimensional function then the function is defined as a separable function.Here, X * i is the global optimum of X i and is the domain of x.The symbol "∼" (will be used later as well) indicates that the variables at this position take any fixed values in their domain.
Example 1: Here, is an example of separable functions The variables can be divided into three groups: , and X 3 = {x 5 }.Each group can be optimized independently regardless of other variables.
Remark 1: The concept of "separability" mentioned in this article is called "general separability," which is different from the commonly used "additive separability" (see Definition 3).General separability includes both additive separability and nonadditive separability.
Remark 2: If a function does not satisfy Definition 1, it is defined as a nonseparable function.If a function satisfies Definition 1 and each variable group contains one variable, it is also called fully separable function.Otherwise, it is called partially separable function.
2) Nonseparable Variable Group: In order to study each variable group of a separable function individually, we propose the concept of nonseparable variable groups.
Definition 2 (Nonseparable Variable Group): When f (x) is completely decomposed, the formed variable group X i in (2) is defined as nonseparable variable group (abbreviated as nonseparable group).X i satisfies arg min where i is the domain of X i .The independent minimum point of X i is always equal to its true minimum point X * i , regardless of other variables.
Remark 3: Nonseparable groups are used to describe the formed variable groups after a complete (or ideal) decomposition.In Example 1, X 1 , X 2 , and X 3 are nonseparable groups.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
Since an incomplete grouping will generate groups of a larger size, even though these larger groups still satisfy separability, they are not nonseparable groups according to this definition.For example, {x 1 , x 2 , x 3 , x 4 } in Example 1 is not a nonseparable group because it can be divided further.The defined nonseparable group X i indicates the case where the variable group has the least elements while ensuring separability.
A nonseparable variable group was earlier considered as the set of all interacted variables [5], [26] (see the remark of Definition 3 for the definition of interaction).Definition 2 provides a more basic elaboration of the nonseparable variable group.
The goal of variable grouping is to accurately divide all nonseparable groups from the decision vector.
In this article, we default the symbol X i to be the variable group with known separability information (or obtained by grouping methods), while the default V i is the variable set with unknown separability information (or to be detected).

B. Extension of Separability
Functions with some particular forms must be separable.In the following, we introduce three forms of separability: 1) additive separability; 2) multiplicative separability; and 3) composite separability.Finally, we give the notion of strongly-and weakly-separable functions.
1) Additive Separability: Additive separability is a class of separability that is widely investigated in recent studies.It describes the functions that can be expressed by additive sums of several subfunctions.
Definition 3 (Additive Separability [41]): If the variables of an n-dimensional function can be divided into m (m ≥ 2) variable groups (X 1 , X 2 , . . ., X m ), satisfying then the function is additively separable.Example 2: An example of additively separable functions is Remark 4: It is obvious that additively separable functions satisfy separability (for a short proof please see [5]).Therefore, nonadditively separable functions are defined to be separable but not additively separable [33].
For additively separable functions, there are two important definitions: 1) direct interaction and 2) indirect interaction.Direct interaction means that two variables x i and x j satisfy (∂f /∂x i ∂x j ) = 0, while indirectly interacted variables do not satisfy the above equation, but jointly interact with some other variable.
Furthermore, the finite differences principle [18] in DG identifies separability by detecting interaction.
2) Multiplicative Separability: In addition to additive separability, we find that certain functions consisting of the product of subfunctions also satisfy separability.Hence, we name them as multiplicatively separable functions.

Definition 4 (Multiplicative Separability): If the variables of an n-dimensional function can be divided into
Multiplicatively separable functions satisfy separability (Definition 1).Thus, they are nonadditively separable.
Proof: If optimizing the variable group X i independently, we have arg min Here, we abbreviate m j=1,j =i f j (X j ) as .Since the subfunction f i (X i ) is constant positive or negative, the sign of their product is also constant, i.e., ≥ 0 or < 0.Then, (6) follows these two cases: arg min Thus, searching optimal for X i is only related to its subfunction f i (X i ).Let arg min X i f i (X i ) or arg max X i f i (X i ) be X * i , the above equation can be written as follows: arg min Equation ( 8) demonstrates that X i is a nonseparable group (Definition 2) of f (x).Similarly, it follows that variables of other subfunctions can also form nonseparable groups.Therefore, f (x) is a separable function.Example 3: An example of multiplicatively separable functions is ). Multiplicative separability is originally a mathematical concept, which simply refers to that a function can be expressed by the product of several subfunctions [42].However, such functions can be optimized separately only when each subfunction is constantly positive or negative.
3) Composite Separability: Wang et al. [43] found a case of composite functions satisfying separability.Here, we integrate it into a specific class of separability forms called composite separability.
Definition 5 (Composite Separability): If f (x) is a composite function consisting of an inner function g(x) and an outer function U(•), i.e., f (x) = U(g(x)), and the function satisfies the following.1) g(x) is a separable function.
2) U(•) is monotonically increasing in its domain then the function is a compositely separable function.
Proof: Since the inner function g(x) is separable, we suppose that X i is a nonseparable group of g(x), i.e., arg min As the minimum point of X i is X * i and U(•) is monotonically increasing, which follows that: and 10) and ( 11) are the same.It results that arg min As for the composite function f (x), we get arg min which means X i is also a nonseparable group of f (x).
Similarly, it follows that the other nonseparable groups of g(x) are also nonseparable groups of f (x).Hence, where the inner function g(x) = x 2 1 + x 2 2 + 1 is separable and the outer function U(•) = ln(•) is monotonically increasing.
Definitions 4 and 5 show the two types of nonadditive separability.However, in addition to these two types, there are many other undiscovered forms of nonadditive separability.
A significant distinction between additively and nonadditively separable functions is the finite differences for a particular variable.The finite differences of additively separable functions are all the same, while that of nonadditively separable functions are not always the same.Fig. 1 illustrates the distinction between additively and nonadditively separable functions by the vertical section of the 3-D function graph.We calculate the finite differences of f (x 1 , x 2 ) as x 2 goes from −5 to 0 when x 1 is fixed to three different values, i.e., for which the values are all 25, while Fig. 1(b) shows a nonadditively separable function (compositely separable, for which the values vary.However, the two functions are all separable according to Definitions 3 and 5.The finite differences principle in DG determines nonseparability by the distinction of , which therefore misidentifies nonadditive separability as nonseparability. 4) Strong Separability and Weak Separability: It can be easily observed that additively separable functions remain separable on any subspace of the domain, but this is not necessarily the case for generally separable problems.We introduce the concept of strong and weak separability to describe the degree of separability.In layman's terms, strong separability means being separable everywhere, while weak separability means being separable only in the original domain.
Definition 6 (Strong Separability and Weak Separability): A function f (x) is defined to be strongly separable if its nonseparable group X i can still be optimized independently regardless of any other constraints it is subject to.In other words, X i satisfies arg min Here, const is a constant vector and i is the constraint space of X i .In contrast to (4) of Definition 2, each nonseparable group of strongly separable functions converges to a certain constant vector const, this is because the global minimum point X * may not be included in the constraint space i .By contrary, weakly separable functions only guarantee that X i can be optimized independently in its original domain, but X i may not remain separable when its variables are subject to certain constraints.
Strong separability is a prerequisite for the separability identification principle in the next section, which ensures that the size of the detected variable set can be arbitrarily reduced.Strong separability also ensures that functions are suitable for divide and conquer without being affected by constraints or population aggregation (caused by local optimum).
Additively, multiplicatively, and compositely (with strongly separable inner functions) separable functions all satisfy strong separability (please see Section S.I of the Supplementary Material for relevant proofs).Fig. 2 is a Venn diagram of the relationships between various separability.
We show two examples of weakly separable functions in Section S.II of the Supplementary Material.
In this article, separable functions that we decompose by default are strongly separable functions.

C. Minimum Points Shift Principle
In this section, we present the core theory of this article, the minimal points shift principle, which consists of two subprinciples that discriminate the separability relationship of any two variable sets.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

1) Independent Global Minimum Points Shift Principle:
This principle contains a new definition and a theorem that infers separability relationship between two variable sets by the shift of their independent global minimum points (i.e., the interplay described below).
If a variable set is affected by other variable sets and cannot be optimized independently, we term it interplay (to distinguish it from interaction).Technically, interplay and interaction are equivalent under additive separability.
Definition 7 (Interplay): In f (x), V i and V j are two mutually exclusive variable sets.When V j takes different fixed values in its domain j , if the independent minimum points of V i shift, i.e., arg min then we define that V i interplays with V j .Theorem 1: Let f (x) be a strongly separable function.V i and V j are two mutually exclusive variable sets.If V i interplays with V j , then it follows that V i and V j contain variables from the same nonseparable group(s).
Proof: Here, we prove the contrapositive of the above proposition, i.e., V i and V j do not contain variables from the same nonseparable group ⇒ V i does not interplay with V j .
First, we suppose that the variables in V i are subordinate to nonseparable groups X i , X j , . . ., X k .We use X to denote the collection of these groups, i.e., X = Then since V i and V j do not contain variables from the same nonseparable group, there must be Recalling the property of nonseparable groups (4), X satisfies the same property as a single nonseparable group arg min However, X is actually constrained.We denote V i as the complement of V i (i.e., V i ∪ V i = X).When V i takes an fixed value V 0 i , it can be treated as a constraint V i = V 0 i .Thus, the independent optimization of X becomes arg min Since f (x) is a strongly separable function, applying the property of strong separability (14) to the above equation, we have arg min Splitting const 1 into two constant vectors, it follows that: Hence, we have arg min The above equation can also be written as follows: arg min that is, V i does not interplay with V j .
Here, the proof of the contrapositive is completed.Thus, the original proposition holds.
Remark 5: The proof logic is as follows.The contents in parentheses before "=⇒" are the basis of deduction.
Original Proposition: V i interplays with V j .=⇒ V i and V j have some variables from the same nonseparable group(s).Contrapositive: V i and V j have no variables from the same group.(Extend V i to X) =⇒ X ∩ V j = ∅ (Definition 2) =⇒ The minimum points of X do not shift.(Definition 6) =⇒ The minimum points of V i do not shift (V i does not interplay with V j ).The above theorem indicates that there is a nonseparable relationship between two variable sets if the independent global minimum points shift.Hence, some variables in the two variable sets should be grouped together.Example 5: An example of Theorem 1 is shown in Fig. 3.The function we use here is , which is a nonseparable function.Let x 1 be V i and x 2 be V j .When x 2 takes different fixed values, the independent global minimum points of x 1 are inconsistent, i.e., x 1 interplays with x 2 .Then according to Theorem 1, it follows that x 1 and x 2 are from the same nonseparable group.
2) Independent Local Minimum Points Shift Principle: It is known that the separability relationship of two variable sets can be derived by detecting whether the independent global minimum points shift.However, for multimodal problems, it is usually difficult to find the independent global minimum point, while it tends to find the independent local minimum points.The principle proposed in this section is to determine the separability relationship by independent local minimum points.
Definition 8 (Local Interplay): In f (x), V i and V j are two mutually exclusive variable sets.Let V ( * ) i be an independent Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
The independent global minimum points shift (interplay), determining that x 1 and x 2 are from the same nonseparable group.local minimum point of V i and i is a tiny neighborhood of V ( * )  i .When V j takes different fixed values, if the independent local minimum points of V i shift in i , i.e., arg min then we define that V i locally interplays with V j .Theorem 2: Let f (x) be a strongly separable multimodal function and V i and V j are two mutually exclusive variable sets.If V i locally interplays with V j , then it follows that V i and V j contain variables from the same nonseparable group(s).
Proof: Since V i locally interplays with V j , we have arg min which is equivalent to V i interplaying with V j in the constrained space i .
Here, we use proof by contradiction, making a hypothesis that V i and V j do not contain variables from the same nonseparable group.Using the same way as (16), extending V i to X, and since f (x) satisfies strong separability, we have arg min According to ( 17)-( 22), the independent optimization of X can be concentrated to V i , which follows that: arg min that is, V i does not interplay with V j in the constrained space i .The above equation contradicts (24), so we conclude that the hypothesis does not hold.Finally, we can derive the proposition that V i and V j contain variables from the same nonseparable group(s).
Remark 6: The proof logic is as follows.The contents in parentheses before =⇒ are the basis of deduction.
V i locally interplays with V j .
The independent local minimum points do not shift (no local interplay), determining that x 1 and x 2 are from different nonseparable groups.
(Definition 7) =⇒ V i interplays with V j in i .(Proof by contradiction) Assuming no variables from group(s).
(Theorem 1) =⇒ V i does not interplay with V j in i (Contradiction) =⇒ Assumption fails, V i and V j have some variables from the same group(s) holds.The above theorem indicates that there is a nonseparable relationship between two variable sets if the independent local minimum points shift.Hence, some variables in the two variable sets should be grouped together.
Example 6: Similarly, an example of applying the above theorem is shown in Fig. 4. The function we use here is which is a multiplicatively multimodal separable function (its detailed information refers to Section S.III of the Supplementary Material).Let x 1 be V i and x 2 be V j .When x 2 takes different fixed values, the independent local minimum points of x 1 are consistent, i.e., x 1 does not locally interplays with x 2 .Then according to Theorem 2, it follows that x 1 and x 2 are from different nonseparable groups.
Corollary of Theorems 1 and 2: Assume v i , v j , and v k are three variables of a function, if v i (locally) interplays with v j while v j (locally) interplays with v k , but v i does not (locally) interplay with v k , then we define that v i indirectly interplays with v k .Clearly, we can get that v i , v j , and v k belong to the same nonseparable group, according to Theorems 1 and 2.
Example 7: Indirect interplay can be exemplified by the following function: In this function, x 1 indirectly interplays with x 3 .

D. Guidelines for Applying the Principle
The proposed principle needs to be transformed into applicable guidelines so that it can be used for the grouping method.In this section, we show three guidelines to apply the minimum points shift principle.1) If V * i is an (global or local) independent minimum point of V i under V j = V (a) j , while V * i is not an independent minimum point any more under V j = V (b) j , we determine that V i (locally) interplays with V j .Then, there are variables from the same nonseparable group(s).
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
2) Contrary to the previous guideline, if V * i is a minimum point under both V j = V (a) j and V j = V (b) j , then the two sets have no variables from the same nonseparable group.
3) If there is an indirect interaction between two variables, then they come from the same group.It should be noted that Guideline 2 is not assured to be correct.This is because ∃ V j ∈ j is required in the definition of interplay, which means that no interplay requires ∀ V j ∈ j , i.e., it requires that every minimum points are not shifted.While only two minimum points are detected in Guideline 2, we cannot guarantee that there is no interplay.However, in practice, Guideline 2 is accurate in most cases, so we still use it as a guideline for grouping.

III. PROPOSED GSG
The decomposition method proposed in this article is named GSG. 1 The term "general" emphasizes that GSG is effective for both additively and nonadditively separable functions.
The goal of decomposition is to accurately divide all nonseparable groups from the decision vector.GSG successfully applies the proposed minimum points shift principle to identify separability and groups all the variables accordingly.
When identifying separability, it is not easy to find the optimum for variable sets, so GSG changes the application of the principle to variable-set detection, where the (local) interplay between a variable and a variable set is detected at a time.We call it the line search when we optimize a single variable.When the minimum point of the variable is found, we can determine the interplay between the variable and the variable set.
The general steps of GSG are as follows.1) Minimum Points Search: A line search method, specifically golden section search (GSS), is used to search for an independent (local or global) minimum point for each variable first.The minimum point found is denoted as v * i , i = 1, 2, . . ., n.When searching for minimum points by GSS, the other variables are fixed as the corresponding variables in a predefined context vector (CV).
2) Interplay Detection: GSG detects interplays between a variable v i and all ungrouped variables (denoted as V) each time.When determining the shift of the minimum points happens, the variables in V take other values that are different from CV.Then, we detect whether the previously found minimum point v * i changes by incrementally and decrementally perturbing v i .If v * i changes, it means that v i interplays with V.
3) Recursive Calls: GSG recursively calls the interplay detection procedure and narrows V until it finds all the variables that (locally) interplay with v i .These (locally) interplayed variables will form a variable group.This step is based on Guidelines 1 and 2. 4) Variable Groups Mergence: GSG checks whether there is shared variables in all formed variable groups (i.e., checks indirect interplay), and if so, merges these variable groups.This step is based on Guideline 3.
To better illustrate GSG, the pseudo-code is shown in two parts: 1) the main framework of GSG (Algorithm 1) and 2) recursive interplay detection [(RID), Algorithm 2].
Algorithm 1 starts with a line search (lines 3-7) on each variable after initialization (lines 1 and 2) to find any of the minimum points.Line 4 takes the ith dimension as the beingtested variable and fixes the remaining variables to the relative values in cv.The line search method used in GSG is the golden section search method, which can obtain an arbitrary minimum point of a variable (applicable to black box problems).The search accuracy (precision of the variables) of the GSS is , which is usually set to 10 −6 .During the search, cv is updated gradually by the found minimum point (line 5), while an archival matrix C arc will store the cv used (line 6).The reason why cv needs to be updated each turn is that a better CV facilitates the next line search.The CV of each search needs to be archived to ensure that the CV used in RID is consistent with the current one.It should be noted that the search for the minimum points is performed in a reverse order (line 3).In this way, the minimum points of the latter variables can be found much earlier to update the CV, which makes the CV better.The better the CV is, the easier it is to find the minima for preceding variables.Then, in the grouping stage, the preceding variables can have more exact minimum points, allowing the interplay detection procedure to identify separability more accurately and form groups more quickly.
The scale factor α is the only parameter that needs to be adjusted in GSG, which is used to control the smallest perturbation δ. δ is the initial perturbation for detecting minimum point change, set to the parameter α times the precision of the line search method ( ), i.e., δ = α × .The appropriate range of α is 10 2 − 10 5 .This way, the perturbation magnitude is much larger than the precision of the line search, and thus the line search error is negligible.
Lines 9-23 in Algorithm 1 describe the whole process of grouping.The grouping is executed in the order of the variables.In the beginning, the algorithm generates a variable set V i for detection.The algorithm excludes the variables that are in the same group as v i from the existing groups in N, thus obtaining V i (line 10).Then, GSG finds the variables that interplays with v i from V i by the procedure RID (line 11) and makes them a group g (line 13).After that, GSG checks the existing set N in finding the groups that overlap with g (finding indirect interplay), and merges them with g (lines 18 and 19).Finally, the formed variable groups are put into N, while the set of fully separable variables S is obtained by excluding all variables in N. The above is an overview of the grouping process.
Here, we focus on the RID procedure (Algorithm 2).RID has the effect of capturing variables that interplay with v i in V i .It first judges whether V i needs to be detected or not (lines 1-4).Then, the procedure loads the CV used in the line search stage from C arc and sets it as the reference vector x (line 5).At this point, the ith dimension of x is an independent minimum point under this CV.
Thereafter, the variables where V i are located are changed (line 6), in preparation to detect the interplay between v i and V i .RID checks whether the minimum point of v i changes after perturbation in its neighborhood.According to the guideline in Algorithm 1 GSG Input: f, D, ub, lb, (precision of golden section search), α (scale factor of initial detection step) Output: S (set of captured separable variables), N (set of grouped nonseparable variables) Section II-D, if the minimum points are different after changing V i , it indicates that v i interplays with V i .Specifically, two vectors x l , x r are created (line 7) and made to increment and decrement by adding and subtracting δ to v i (line 9).x l and x r are evaluated, and the increment or decrement δ is increased until their differences from x are both able to be expressed by the computer with minimum precision (line 8).Line 16 enters the interplay detection stage.If the reference vector x does not have the smallest fitness among x, x r , and x r , it means that the minimum point of v i has changed after changing V i .Therefore, it follows that v i interplays with V i and vice versa.Afterward, RID will continue to analyze which specific variables in V i interplay with v i .When there are more than one variable in V i , RID splits V i into two equally sized variable subsets (V i1 and V i2 ).The procedure recursively calls RID itself (Lines 21 and 22) to identify interplays between v i and the split subsets until all the interplayed variables are found.Finally, RID delivers all the found interplayed variables to the upper-level function and finally merges them into g for output.
To summarize, GSG obtains the interplay relationship of all variables by multiple detections, and then forms the variable groups.Finally, GSG checks if there are groups that share variables, and if so, GSG merges those groups.

IV. NEW BENCHMARK BASED ON NONADDITIVE SEPARABILITY
Among the widely used CEC LSGO benchmarks, only Ackley's function and its extended test problems are nonadditively separable.To address the lack of no-additively until the two differences can be expressed by computers*/ 9: split V i into two equally-size variable sets V i1 and V i2 21:

A. Basic Functions
The following four basic functions are all nonadditively separable, i.e., they are separable but not additively separable.In our analysis of nonadditive separability (Section II-B), two new separability forms, multiplicative separability, and composite separability, are provided differently from the conventional additive separability.Accordingly, we design these four basic separable functions corresponding to these two forms: the former two functions are with multiplicative separability, whereas the latter two are with composite separability.The four basic functions are as follows.
1) Product of Square Function: x 2 i .Detailed information (including range, global optimum, and modality) on these four basic functions and the 2-D images can be found in Section S.III of the Supplementary Material.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

B. Proposed Benchmark BNS
There are 12 test problems in BNS, which are shown in Table I.They are all transformed from the basic functions with three different separability degrees: 1) fully separable; 2) partially separable on half variables; and 3) partially separable on all variables.
First, all variables are spilt to m groups according to the order of a random permutation P to variable indexes.Then, to produce nonseparability, rotation transformation M is adopted to each variable group.Separable groups can be transformed to nonseparable groups by this transformation.Finally, we connect each nonseparable group by multiplication or composite function.This is a new connection method different from the additive connection.
All problems have a global minimum point of o (a random shifted vector) and an optimum of 0. The coefficients in these test problems allow the value domain of the problems to fall within a reasonable range, avoiding function values that are too large to represent by the computer when the dimension is extended.

V. EXPERIMENTAL STUDY
In this section, we experimentally test the grouping performance of GSG and the optimization performance when test problems are grouped by GSG.The experiments are divided into two parts: comparison of decomposition and optimization.
In general, the main purposes of this section are as follows.1) Explore the decomposition effectiveness of the proposed algorithm for additively and nonadditively separable problems.2) Confirm that nonadditively separable problems are also suitable for optimization with proper problem decomposition.

A. Comparison on Decomposition
In this section, we will experimentally compare the comprehensive performance of the proposed GSG with some state-of-the-art decomposition methods (DG2 [23], RDG [29], and ERDG [30]).The comparison items are the decomposition accuracy of separable and nonseparable variables and the computational resource consumption.The benchmarks used in this experiment are BNS, CEC2010 [44], and CEC2013 [40].The parameter α of GSG is set to 10 4 , and the parameters of other algorithms are set to the default values in their respective papers.
1) Accuracy Metric: The decomposition accuracy is divided into decomposition accuracy for separable and nonseparable variables, which is calculated from the grouping results.In the grouping results, the set of separable variables is denoted by sep; while the nonseparable variables form k groups, denoted by g i .The true grouping is obtained from the code of the test functions.The true separable variable set is denoted by sep * , while the true nonseparable group is denoted by g * i , with a total of m.The decomposition accuracy of separable variables is defined as follows: which means the proportion of the obtained number of separable variables and the true number of separable variables.The decomposition accuracy of nonseparable variables is defined as follows: In this definition, we find the resultant group g i that is most similar to the true nonseparable group g * i and count the number Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.2) Decomposition on Nonadditively Separable Problems: In this section, the performance of GSG is compared with the decomposition methods DG2, RDG, and ERDG on the proposed new benchmark BNS.The dimensionality of the test question is set to 1000.The experimental results are listed in Table II.
The results show that GSG achieves a decomposition accuracy of 100% on all functions.On the contrary, almost all other methods achieve 0% for the decomposition of separable variables.While for nonseparable variables, the other three methods achieve only 10% for f 5 − f 8 and 5% for f 9 − f 12 .This is due to the fact that these problems are judged as fully nonseparable functions, resulting in a variable group of size 1000.The other three methods cannot capture nonseparable groups connected by multiplication or composite function.
3) Decomposition on Classical Problems: In addition to making comparisons on nonadditively separable problems, the proposed GSG needs to be tested on the classical CEC2013 and CEC2010 LSGO problems to show its effectiveness on common additively separable problems.The dimensionality of the test questions is also 1000.The comparison results are shown in Table III.It is worth noting that f 3 and f 6 from Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.CEC2013 is the most challenging and comprehensive LSGO benchmark so far.From Table II, the results show that GSG outperforms DG2, RDG, and ERDG in terms of the decomposition accuracy on separable variables, but GSG is not better than the other three methods on nonseparable variables.Specifically, as for the separable variables, GSG has an accuracy of 0% on f 6 , while the other three methods have almost 0% accuracy on f 3 and f 6 .The reason for GSG's failure is that the special transformation introduced in CEC2013 causes a large number of ill local optima, which affects GSG's detection of the minimum points shift.The reason for the other three methods' failure is that they all fail to detect the nonadditively separable parts (Ackley's function).On the other hand, GSG performs poorly on f 6 , f 8 , and f 10 for nonseparable variables because the computational error and the setting of CV may cause low accuracy.Similarly, RDG and ERDG also perform poorly on these problems because of their high complexity.
We next discuss the performance of each algorithm on CEC2010.GSG achieves an accuracy of 100% on all problems, while the other three methods achieve poor performance on f 3 , f 6 , and f 11 on separable variables.These three problems are similar to f 3 and f 6 of CEC2013, which are based on Ackley's function.The other three methods cannot detect nonadditive separability.
4) Decomposition on Higher-Dimensional Problems: In this section, we investigate the scalability of GSG.We test the decomposition accuracy and computational resource consumption of GSG and three other methods on 1000 to 5000 D LSGO test problems.The benchmarks for the scalability test are BNS and CEC2010, because CEC2013 problems are fixed and not suitable for extending the dimensionality.The results shown in Table IV are the average decomposition accuracy and average computational resource consumption for all the problems in the benchmarks.Furthermore, the detailed decomposition results on different higher-dimensional problems are shown in Section S.IV of the Supplementary Material.
As is observed from Table IV, the decomposition accuracy of GSG is much higher than the other three methods after increasing the dimensionality.The other three methods basically treat the test problems as fully nonseparable problems, and thus achieve poor results.In addition, GSG achieves 100% accuracy for all dimensionality on CEC2010, while the other methods still have unsatisfactory decomposition results on separable variables.5) Computational Resource Consumption: GSG consumes more computational resources than the state-of-the-art decomposition methods.GSG usually consumes 10 4 magnitude of FEs for 1000-D problems, while ERDG and RDG require 10 3 .The main reason is that GSG consumes a large amount of computational resources in the initial search for minimum points.
Table V shows the consumption of GSG's computational resources on each benchmark.For the 1000-D problem, GSG consumes about 5 × 10 4 FEs on average.Among them, the FEs spent for GSS to find the minimum points reach 3 × 10 4 , which is about 60% of the total consumption.We can estimate that each time we use GSS, it will cost roughly 35 FEs.The detailed FEs consumption on different problems are shown in Section S.VI of the Supplementary Material.
However, although GSG consumes more computational resources, it can solve nonadditively separable problems that cannot be handled by the state-of-the-art decomposition methods (e.g., RDG and ERDG).Moreover, the finite differences principle used by these state-of-the-art algorithms has undergone nearly a decade of improvement before the computational resource consumption was significantly reduced [18], [28], [29], [32].Specifically, the computational complexity of DGseries methods is reduced from O(n 2 ) of DG2 to O(n log 2 n) of ERDG.Therefore, more research attention should be invested to computational resource consumption in the future research of GSG.Finally, we summarize the decomposition performance of GSG.As a decomposition method using a new separability identification principle, the proposed GSG performs quite well in dealing with both nonadditively separable problems and additively separable problems.In particular, GSG can effectively deal with nonadditively separable problems, which is unable to be done by the state-of-the-art decomposition methods.
It should be mentioned that the success of GSG demonstrates the effectiveness of the minimum points shift principle, which breaks through the previous dilemma of the separability identification principle in dealing with nonadditive separability.

B. Comparison on Optimization
In this section, we experimentally compare the optimization results of different grouping methods.The focus of the optimization experiments is to confirm that the problem decomposition strategy is applicable to nonadditively separable problems, just like common additively separable problems.The experiments compare the optimization results of four grouping strategies on the 1000-D and 5000-D nonadditively separable test problems.The grouping strategies used are described as follows.
1) Random Grouping (RG): RG randomly divides the decision variables into subcomponents with a size of 100.2) No Grouping (NG): NG does not group the decision variables.This grouping strategy provides almost the same grouping results as DG2, RDG, and ERDG on nonadditively separable problems.3) GSG: The grouping results generated by GSG are equivalent to that of the ideal grouping.In addition, all fully separable variables are also regrouped into subgroups of size 100 to enhance the optimization effect.4) GSG-SepOpt: It is a variant of GSG, which utilizes the minimum points during line search (GSS) to assist optimization.In the optimization process, the fully separable variables inherit the minimum points found by GSG, i.e., all separable variables of an individual are set to the minimum points found during the population initialization.
The test problems used in this experiment are all nonadditively separable problems, including all problems of BNS, f 3 and f 6 of CEC2013, and f 3 , f 6 , and f 11 of CEC2010.The experiments are not conducted on the other problems from the CEC benchmarks because the GSG results are similar to that of other mainstream grouping methods.The optimizer used in the experiments is DECC [4].Each item is derived from 25 independent runs and the result with a significant advantage (Wilcoxon's rank-sum test at a 0.05 significance level) are indicated in bold.
1) Optimization on 1000-D Problems: The experimental results on 1000-D problems are provided in Table VI.The FEs spent on each problem is 3 × 10 6 .
The results show that DECC with GSG is significantly better than RG and NG.RG achieves the first place on two problems, while NG (same with DG2, RDG, and ERDG) achieves the first place on only one problem.By contrast, GSG achieved 11 firsts, while GSG-SepOpt achieved 14 firsts.The results show that nonadditively separable problems are suitable for problem decomposition strategy.GSG can identify fully separable variables and group them into subgroups of appropriate size, which helps to improve the optimization performance [45].Moreover, the variant GSG-SepOpt is slightly better than GSG, especially on unimodal problems.The minimum points obtained by the GSS on the unimodal problem are equivalent to globally optimal points.However, this variant does not work well on multimodal problems, such as the f 2 and f 6 of BNS and the five Ackley-based problems in CEC benchmarks.
2) Optimization on 5000-D Problems: The experimental results on 5000-D problems are provided in Table VII.The FEs spent on each problem is 1.5 × 10 7 .The benchmark CEC2013 is not tested because it cannot expand to the dimensionality of 5000.
As is observed, the results of optimizing the test problems grouped by GSG are significantly better than that of using RG or NG.The overall optimization performance is similar to that on the 1000-D test problems.GSG achieves the first place seven times, while GSG-SepOpt achieves the first place 11 times.In contrast, RG achieves only two first places, while NG's optimization is the worst almost every time.Overall, using GSG for problem decomposition can significantly improve the optimization of nonadditively separable problems.When dealing with unimodal problems, the variant GSG-SepOpt shows excellent performance.

C. Parameter Sensitivity Analysis
In this section, we investigate the effect of the parameter α on the performance of GSG.The parameter α is a scale factor in RID, which is used to control the starting step for detecting the shift.The detailed results and discussions are listed in Section S.V of the Supplementary Material.
In general, the parameter α has slight influence on the decomposition accuracy of GSG.A reasonable setting of α should be in the range of 10 2 − 10 5 .

VI. CONCLUSION
In this article, we conduct comprehensive theoretical investigation on separability, which includes basic definitions, separability forms and separability identification principle.Based on the innovative principle, we design a novel grouping method, called GSG, which can effectively decompose additively and nonadditively separable problems that the state-of-the-art decomposition approaches fail to handle.
In the beginning, we make in-depth theoretical analyses on separability.We introduce the concept of nonadditive separability, provide two new forms of nonadditive separable functions and then give the definition of strong separability.Based on these concepts and definitions, we propose the minimum points shift principle (including two critical theorems) for identifying general separability.
Next, we design a decomposition method called GSG based on the proposed minimum points shift principle.GSG identifies interplay by detecting the shift of independent minimum points that found by GSS.The interplayed variables are grouped recursively to create the final variable groups.
Then, we create a new LSGO benchmark called BNS, in which the problems are constructed based on nonadditive separability.BNS has 12 test problems, and its dimensionality can be extended.BNS makes up for the lack of nonadditively separable problems in previous benchmarks.
Finally, we experimentally verify the effectiveness of the proposed GSG.GSG is compared with some state-of-the-art decomposition methods on BNS, CEC2013, and CEC2010.The results show that GSG can effectively decompose various problems, and its overall accuracy is higher than other advanced decomposition approaches.In addition, an optimization test is conducted, which confirm that CC optimization with GSG is indeed applicable to nonadditively separable problems.
The proposed decomposition method GSG is able to effectively tackle generally separable problems, especially nonadditively separable ones that cannot be handled by the state-of-the-art decomposition methods.However, the main drawback of GSG is that its computational complexity is very high, which mainly comes from the line search stage.
In the future, we will make an attempt to reduce the computational resource consumption of GSG by adopting the principle better, e.g., finding a more practical minimum search method, utilizing a more efficient grouping mechanism, and combining GSG with DG-series methods.Another promising improvement is to integrate the problem decomposition stage into the optimization stage.

TABLE II DECOMPOSITION
RESULTS OF EACH ALGORITHM ON THE NONADDITIVELY SEPARABLE PROBLEMS IN THE PROPOSED BNS.SA REPRESENTS THE DECOMPOSITION ACCURACY FOR SEPARABLE VARIABLES, WHILE NA REPRESENTS THE DECOMPOSITION ACCURACY FOR NONSEPARABLE VARIABLES TABLE III DECOMPOSITION RESULTS OF EACH ALGORITHM ON CEC2013 AND CEC2010.SA REPRESENTS THE DECOMPOSITION ACCURACY FOR SEPARABLE VARIABLES, WHILE NA REPRESENTS THE DECOMPOSITION ACCURACY FOR NONSEPARABLE VARIABLES

TABLE IV DECOMPOSITION
RESULTS OF EACH ALGORITHM ON HIGHER-DIMENSIONAL PROBLEMS IN BNS AND CEC2010.THE RESULTS SHOWN ARE THE AVERAGE DECOMPOSITION ACCURACY AND AVERAGE COMPUTATIONAL RESOURCE CONSUMPTION FOR ALL THE PROBLEMS IN THE BENCHMARKS CEC2013 and f 3 , f 6 , and f 11 from CEC2010 are constructed by nonadditively separable functions (Ackley's function), but all other problems are based on additive separability.

TABLE V DETAILS
OF GSG'S COMPUTATIONAL RESOURCE CONSUMPTION ON EACH BENCHMARK IN 1000 DIMENSIONS.THE VALUES ARE THE AVERAGE OF ALL THE PROBLEMS IN THIS BENCHMARK

TABLE VI OPTIMIZATION
RESULTS OF VARIOUS GROUPING METHODS ON 1000-D NONADDITIVELY SEPARABLE PROBLEMS.EACH GROUPING APPROACH IS PAIRED WITH DECC AS THE OPTIMIZER

TABLE VII OPTIMIZATION
RESULTS OF VARIOUS GROUPING METHODS ON 5000-D NONADDITIVELY SEPARABLE PROBLEMS.EACH GROUPING APPROACH IS PAIRED WITH DECC AS THE OPTIMIZER