Search in a fitness landscape: How to assess the difficulty of a search problem

Computational modeling is widely used to study how individuals and organizations search and solve problems in fields such as economics, management, cultural evolution, and computer science. We argue that current computational modelling research on problem-solving needs to address several fundamental issues in order to generate more meaningful and falsifiable contributions. Based on comparative simulations and a new type of visualization of how to assess the nature of the fitness landscape, we address two key assumptions that approaches such as the NK framework rely on: that the NK captures the continuum of complexity of empirical fitness landscapes, and that search behavior is a distinct component, independent from the topology of the fitness landscape. We show the limitations of the most common approach to conceptualize how complex, or rugged, a landscape is, as well as how the nature of the fitness landscape is fundamentally intertwined with search behavior. Finally, we outline broader implications for how to stimulate problem-solving.


Introduction
"Solving a problem simply means representing it so as to make the solution transparent" (1) There is a long tradition of studying how to search for solutions to 'hard' problems, i.e. problems where it is computationally impossible or merely too expensive to list and test all possible solutions (2,3). The prevalent way of addressing individual or organizational search behaviour and how to conceptualize the space of solutions stems from early work on population genetics, namely the fitness landscape model (4). By focusing on fitness interactions between genes, Wright's framework allows for a link between low-level properties of genes and the high-level patterns of the dynamics of evolution (5). The model's most famous extension, the NK model (6), explicitly models adaptive evolution as a "search in protein space" (6) which tries to find a maximum point for a chosen fitness function. This approach has grown outside the boundaries of population genetics literature and inspired a series of scholars from computer science (7), organizational theory (8), economics (9), cultural evolution (10) and physics (11,12) to computationally model complex, adaptive systems.
How can problem-solving be modelled in this framework? Imagine trying to solve an innovation problem, for instance designing a new educational app. The app will likely be based on pre-defined libraries, which constitute interconnected modules; changing something in one module, might influence the functionality of another module. The extent of this interdependence will differ from environment to environment, and influence how one should search for the good design. Sometimes changing one small element at a time ('local search') might be efficient, while in other environments the level of interdependency might make this approach inefficient. Levinthal (13) introduced the NK model to the social science literature in order to facilitate formal modelling and simulation of how the level of interdependence of organizational activities affects its long-term chances of finding the optimal solution to a problem and thus survive in a competitive environment. By making explicit assumptions about individual or organizational behaviour and the environment in which the agent evolves, researchers could now simulate how such agents adapt over time. As in the study of genes in biology, this allows one to map the complex dynamics of agents being embedded in and adapting to the competitive environment (8,(14)(15)(16)(17)(18). The original NK model (6) assumes a landscape where one can modify the complexity by varying the interdependence of elements (the K of the NK). This ability to vary the interdependence is the model's great strength, compared to less flexible simulation frameworks such as armed bandits (19). Furthermore, the model also assumes that the searcher primarily engages in local search akin to the one-bit flip mutations of a gene. We want to address these assumptions in a social context and offer a technical, comparative analysis of search and the empirical landscapes we intend to model. First, we discuss how social science studies usually conceptualize how complex (i.e. difficult to search) a given landscape is. However, we show that the commonly used measure for how rugged a landscape is does not capture how social science landscapes might be ordered by a hierarchy (20) or how neutrality influences ruggedness (16). Second, we provide a novel (to the social science literature) type of visualization that maps how different search strategies actually 'generate' different landscapes, rather than merely constituting search in an a priori given space. The conceptualization of the fitness landscape is thus not independent of assumptions about search behavior. This interdependence was not an issue in the original NK model, since it assumed a particular search strategy of genes engaging in local search with occasional random jumps (6). In contrast, if one changes the parameters of search behavior, the fitness landscape (difficulty) can change.
Overall, we argue that in line with recent trends in biology and computer science, we need to move beyond a simple numerical classification of the ruggedness of a fitness landscape. Focus should be on a categorization of the type of fitness landscape, which will allow the selection of relevant algorithms (21). This also entails further and wider comparisons of the NK fitness landscape features with characteristics found in the empirical world (22) as well as acknowledging that search behavior cannot be isolated from such a comparison between the modelled and the real world landscapes.

Search in NK fitness landscapes
The fitness landscape is what the solver subjectively perceives (23). There are two main elements in the fitness landscape model that need to be specified for problem-solving processes to be captured: the task structure (i.e. the problem that is to be solved) and the search behaviour (i.e. how problemsolving unfolds). In each step of the simulation, the agent follows pre-specified search rules, in an attempt to find the optimal solution, i.e. the peak of the fitness landscape. The landscape is a mapping between solutions and fitness values that takes into account the connectivity between solutions in the search space. In order to define this connectivity, we need to specify a distance metric that informs how agents can move between different positions in the search space.

K/N ratios
Early social science studies relying on the NK model (see Appendix 1) follow on the path proposed by Kauffman (6) and study how the attributes of the search space influence the propensity of finding the optimal solution by a one-bit flip hill climber (13,24). An agent would always try to hill-climb, but in a landscape consisting of many local peaks, this is a challenge and the agent might become stuck before reaching the global peak. The K/N ratio is an attempt to describe how rugged a landscape is. Part of the NK model's popularity in organizational literature is due to allowing the investigation of different problem difficulties (25), via the K/N ratio, or the level of epistatic interactions (26). Epistasis is equivalent to the non-linearity of a problem or how well a problem can be decomposed into sub-problems (Pitzer & Affenzeller, 2012;Rothlauf, 2011). Consider a N=4, K=1 example. A solver with { = (0,0,0,1), ( ) = 0.56}, might move to { 1 = (0,0,0,0), ( 1 ) = 0.58}, even though { 2 = (0,0,1,1), ( 2 ) = 0.72}. The fact that the optimal setting for the fourth allele is {1} is obscured by the epistatic interaction with its neighbour on the third position. Generally, studies in the NK field have assumed and tried to show that the higher the K/N ratio, the more rugged landscape, and the more difficult it is for the agent to search in it (27). This assumption is based on the idea that landscapes are either smooth or rugged, and that no other features of a landscape interferes with the assessment of difficulty of a landscape. The question now is how useful this K/N measure is. Does it contain predictive power and does it emulate features we can expect to encounter in the empirical world?
Despite the fact that the notion of epistatic interactions, as outlined above, is widely used in e.g. organizational theory, its use in quantifying the difficulty of a problem has been criticized (28,29) in particular due to the difficulty of identifying measures of epistasis that have adequate predictive power (7). There are at least two main limitations to using epistasis measures as proxies for problem complexity. First, epistatic interactions can be both positive and negative. Whether an interaction effect between two alleles is positive or negative has a significant impact on the difficulty of a problem, but epistasis measures (e.g. epistasis variance or correlation) cannot capture this distinction (30). Second, empirical evidence suggests that epistatic interactions can occur at several levels, i.e. there are hierarchical interdependence structures. This has consequences for the long-term dynamics of the system (31).
Paralleling the above investigations into the value of using epistasis measures as proxies for problem complexity, there have been considerable development when it comes to the study of fitness landscapes (7,21,32,33). This entails a focus on if our categorization of fitness landscapes capture the features we can expect to encounter in the empirical world has emerged. Thus, rather than quantitatively characterizing problem difficulty (via ruggedness measures such as the K/N ratio) the aim is to categorize fitness landscapes into different types in order to determine the appropriate algorithm (21). The shift is due to fitness landscape analysis allowing for "a deeper understanding of a whole problem class" (7) rather than a specific problem instance. Current research thus seeks to identify relevant features that can describe a fitness landscape as well as having known properties with respect to problem-solving difficulty (34). For example, Malan and Engelbrecht (34) point out two relevant features beyond the difference between smooth and ruggedness that also influence a searchers ability to navigate the landscape and reach the optimal solution; neutrality and deceptiveness ( Fig. 1).

Fig. 1 Landscape features adapted from Malan & Engelbrecht 2013. a) Smooth landscape. b) Rugged landscape. c) Deceptive landscape. d) Neutral Landscape
In line with the prevalent approach in organizational theory, where the one-bit hill-climbing algorithm is the dominant search behavior, in the following we describe how these two features (deceptiveness and neutrality) can affect the likelihood of finding the optimal solution for a classic one-bit hillclimbing algorithm. The features' potential impact are not limited to this search heuristic.

Deceptiveness
Recent advances in biology point to the existence of higher-order epistatic interactions, which generate multidimensional landscapes (35,36). These interactions seem to be organized hierarchically in functional modules that interact with each other (36,37). This type of interaction structure is reminiscent of the hierarchical structure, which has been argued to be an essential feature of organizational problems, at least when it comes to innovation problems (20,38). In this context hierarchy is conceptualized as the composition of systems out of subsystems with each subsystem in turn having its own hierarchy (39), until a certain level of fine grained modularity is achieved. This is a qualitatively different kind of 'problem complexity' (as compared to landscape 'ruggedness') and the one most likely to be encountered in real-life problems (38)(39)(40). In other words, the hierarchical decomposition and hierarchical interdependence are different from the one-level interdependence, which is captured by NK-like landscapes -see also Marengo and Dosi (41) for a more detailed account. Importantly, a K/N ratio does not capture if a problem is hierarchical.
Such hierarchical problems are likely to generate deceptive landscapes (cf. Fig. 1c), according to Malan and Engelbrecht's (34) classification, since they generate so-called hierarchical traps where local search gets stuck (40,42,43). The interactions between building blocks make hierarchical problems deceptive (i.e. misleading according to Jones and Forrest (44)) in Hamming space at lower hi-erarchical levels, but fully non-deceptive at higher hierarchical levels (45). Empirical studies in computer science analyzing how different computational algorithms solve computer games also reveal that a measure of complexity does not capture the deceptiveness of the game (46).

Neutrality
The metaphor of a rugged NK landscape focuses on the smooth vs. rugged distinction, and how to quantitatively measure the ruggedness. A different intuition about how evolutionary dynamics might be influenced by the underlying fitness function emerges from models that consider the possibility that some solutions have equal fitness. This was fueled by developments in molecular biology, which have questioned the 'rugged landscape' metaphor, in particular its explanation of speciation (47,48). This work was largely driven by the neutral theory of molecular evolution and in particular the observation that the majority of mutations at a molecular level do not affect the phenotype (16,49). The traditional NK framework assumes that once a population became stuck in a suboptimal peak it could only escape it if the fitness function was changed (e.g. shifting balance theory) or via a long jump. The neutral theory of molecular evolution relies on the conjunction that there must be a series of fitness neutral mutations that would allow even organisms that were currently located in a suboptimal peak to 'escape', and undergo further evolution. Neutrality have also been observed in quantum physic experiment, where it rendered sequences of 1D parameter optimizations unproductive (11) A number of authors have introduced neutral extensions of the NK landscape and investigate how the new topology might influence the evolutionary processes (16,47,50,51). The implementations vary in both details and conclusions regarding the influence of neutrality on the features of the landscape (52), but they do conclusively show that neutrality is an important feature that influences search performance and is not captured by the traditional measures of ruggedness (7) commonly used in NK studies.

Landscape ruggedness: modality and locality measures
We have addressed the pitfalls of using K or K/N ratios as measures of landscape ruggedness (the common approach in the social sciences) as well as identified features that can impede any classifications of ruggedness. While it is clear that K influences how an agent is to search a given landscape, it is not clear how much epistasis "is needed to make a problem difficult" (53). Thus, we present a number of alternative approaches to capture landscape ruggedness.
In computer science, a frequently used measure of landscape ruggedness is the number of local maxima, or the modality of a landscape. The modality of a given landscape is often computed relative to the size of the fitness landscape: the higher the density of such local optima, the more complex the problem, i.e. the higher the likelihood that a solver will be stuck and unable to find the optimal solution. Note that the definition of a distance metric (and implicitly the neighborhood function) affects the number of local optima, since, by definition, for a problem ( , ) and a neighborhood function , a solution * is called locally optimal with respect to , if ( ) ≤ ( * ) for all ∈ ( ).
Another perspective relies on the locality of a landscape, which is given by how closely together (with respect to the distance d) solutions with similar fitness values are located (54). In general, the lower the distance, the higher the locality and the easier it is to find a global optimum, since better solutions are located closer together (7). One way of quantitatively measuring the locality of a landscape is a fitness distance correlation coefficient (44).
with as the mean value for the fitness function, opt the mean value for the distance to the optimal solution, the fitness value for solution i and ,opt is the distance of solution i, to the optimal solution * .
The fitness-distance correlation coefficient, allows Jones and Forrest (44) to distinguish between three classes of landscapes: Straightforward, for FDC ≤ −0.15. This is the ideal case where the closer a solver gets to the global optimum, the higher the fitness. These cases are roughly correspondent to 'smooth' landscapes. NK problems where K ≤ 3, fall in this category.
Difficult −0.15 < FDC < 0.15 There is limited correlation between the fitness difference and the distance to the optimal solution. This makes such optimization problems very hard to solve and renders the search heuristics to random search. According to Jones and Forrest (44) as K increases over 3, NK landscapes quickly become uncorrelated and FDC approaches 0.These are 'rugged' landscapes, with limited or uncorrelated ruggedness.
Misleading FDC ≥ 0.15. There is an inverse correlation between the fitness difference and the distance to the optimal solution. Thus, the solver is drawn away from the global optimum. According to Malan and Engelbrecht's (34) classifications, these are 'deceptive landscapes'.
Jones & Forrest thus provide a quantitatively informed qualitative assessment of how difficult a landscape is to navigate in. This is in line with the recent trend arguing that we need to be able to classify landscapes and then identify which algorithm is appropriate, rather than searching for an absolute identification of the difficulty of a landscape. In addition to providing information about the type of fitness landscape, the fitness distance correlation also manages to capture the challenge of deceptiveness, unlike the K/N ratio. However, Jones and Forrest (1995) did not aim to capture neutrality, which still can influence the nature of the fitness landscape.
Thus, if neutrality is a feature that characterizes social science problems, caution should be used when characterizing the fitness landscape by relying on K/N ratios (55). As Huynen et al. (56) argue, a small value for the fitness distance correlation (i.e. −0.15 < FDC < 0.15) that would normally be connected with a very rugged landscape, is not informative as to the ease or difficulty of finding the global optimum since local optima, when connected, are no longer local (56). Lobo et al. (50) who conclude that there is an interplay between the ruggedness and neutrality of the landscape further explore this. Their simulations suggest that the desirability of neutrality is contingent on the former. For instance, for rugged landscapes, neutrality is beneficial, while neutrality just makes adaptation slower in smooth landscapes.
In consequence, the quantitative ruggedness measures detailed in the previous section do not necessarily capture the relative ease or difficulty an adaptive solver would have on a landscape that has neutral ridges.

Changing Search Strategies
Typically, an agent is set to primarily engage in one-bit flips when searching the landscape. Yet, different search strategies (e.g. two-bit flips) not only reflects an assumption about search behavior, but also constitute variations in distance metrics that influences the shape of the fitness landscape itself. In other words, the same fitness function can thus lead to qualitatively different landscapes, when assuming different kinds of search behavior. To illustrate this point, Fig. 2 shows an example of the same function mapped onto three different landscapes using three different assumptions about search behavior. We used a dimensionality reduction method called t-sne (57) that transforms high dimensional data to low-dimensional representations while approximately preserving pair-wise similarities to create 3D visualizations of the multi-dimensional landscape. Given the fact that the t-SNE (t-distributed stochastic neighbour embedding) algorithm is stochastic, as is the NK fitness function, it should be noted that this is one possible illustration of one possible NK landscape with N=8, K=3. The illustration is not a general result for all NK landscapes of N=8 and K=3. The visualization is informative in two ways. First, when assessing the visualization it is clear that the two-bit-flip search strategies (Fig. 2b) generate qualitatively different landscapes. Depending on the starting point, a subset of solutions is not connected in the graph. Similarly, if one attempts to traverse a sequence of consecutive numbers with increments of two, one generates two distinct and unconnected subsets: bit strings with an odd or even number of non-zero bits. Thus, the definition of the neighborhood function can effectively reduce (relative to the entire search space) the size of the landscape. Second, and more importantly, the heat-map reveals information regarding the distribution of fitness scores. The three neighborhood representations yield three different landscape topologies, i.e. smoother gradients such as the left-hand side of the two-bit flip mean that it would be easy for an agent to find the global optimum, while 'patchier' surfaces translate into a lower likelihood of success, such as the one-bit and the right-hand side landscape generated by the two-bit flip. Equivalently, in the 2D decimal representation, one can assess the difficulty of finding the global optimum (the highest fitness value), by looking at the shape of the generated curve (Fig. 2c). In the 2D case, since the decimal representation is arbitrary, the 'decimal' landscape is very 'rugged'; thus, a solver will likely be stuck in a suboptimal solution.
Overall, the visualizations demonstrate that a landscape is not merely inherently easy or difficult, since the difficulty will depend on the agent's prevalent search strategy. The extent of ruggedness, as defined in Kauffman's original model, is assumed to be given by a one-bit mutation of the candidate solution (6). This was argued to be a reasonable assumption for how genes mutate and adapt. However, as Frenken et al. (58) point out, the assumption of one-bit flip is of limited relevance in the context of human search behaviors, since such an one-bit conception does not fit human behavior: human problem-solvers are not constrained to engaging in small, incremental changes. In fact, a number of experimental studies all showcase an average Hamming distance of about 2.5 (9,27,59). While genetic mutations can be considered one-bit flips, human search is maybe more akin to genetic manipulations carried out by geneticists. Thus, it appears problematic to consider one-bit flips archetypical and in any case, it turns out to be important which kind of search behavior is being modelled. To sum up, search behavior and the ruggedness of fitness landscapes are not independent, unless one restricts search behavior to local search (one-bit flips).
The challenge outlined above is not limited to a non-hierarchical NK landscape, but can also be extended to a hierarchical search environment. Fig. 3 shows the visualization of a hierarchical problem using a one-bit flip hill-climber (Fig. 3a) and a 'chunking' algorithm ( Fig. 3b) that was tailored specifically for this problem (see Appendix 2 for a list of moves). Maps are generated based on agents in both setups starting from random points in the landscape. The underlying fitness function is identical, yet the visualization shows that the 'chunking' landscape turns out much smoother than the onebit-flip landscape and even more so for the bitstring landscape. Since the H-XOR function has 2^(N/2) local optima for the one-bit flip hill-climber (as illustrated by the two global peaks in Fig.  3c), the probability that a given point in the one-bit landscape is connected with a path to the global optimum is significantly lower as compared to the chunking algorithm (Fig. 3d). In other words, this problem is 'deceptive' for a one-bit-hill-climber but not for agent relying on a problem representation that can exploit the problem structure. Again, a simple K/N ratio does not capture this type of feature of the landscape and it is not possible to disentangle assumptions about search behavior and the assessment of the fitness landscape characteristics.

Discussion
While the simulation approach has gained attention in high-status outlets within the social sciences in general and organizational theory in particular, we here want to acknowledge and address the skeptical concerns still being raised about theoretical assumptions (60) and the weak empirical grounding of these assumptions (61)(62)(63). Much like in the original biological setup, the organizational literature has had simplistic assumptions about agent behavior, embedded in a relatively undefined fitness landscape (64,65). However, unlike microbiology, where evolutionary forces are well known (66) defining human search behaviors and assessing the ruggedness of the fitness landscape in this conceptual framework turns out to be elusive.
Based on novel visualizations and simulations, we showed that the neighborhood function influences how rugged a landscape will appear to the searcher. Since one can't just assume one-bit flip search (27,59), this turns out to be a fundamental challenge to any a priori assessment of fitness landscapes. Second, even if one assumes one-bit flip search, there are substantial challenges to traditional, quantitative measurements. Landscapes can be hierarchical and thus be deceptive (46,59), and neutrality can influence how adaptive search unfolds in a given fitness landscape (16).
Given our limited understanding about the genotype-phenotype mapping in a technological setting (67), we suggest that the focus should not be on the statistical features of the landscape to be searched under the one-bit flip condition, but on how the interplay of search behaviors and the different natures of interdependence structures translate into problem-solving performance. One way forward, as suggested by current developments in computer science, is to revert to what Wright (4) and Kauffman (6) originally proposed; relying on fitness landscapes to first acquire a 'rough' image of a problem class, instead of investigating specific instances (7,34). This would entail moving beyond a simple, quantitative assessment of the topology (cf. the K/N ratio) or even the more sophisticated approach by Jones and Forrest (44).
These challenges illustrate that we need further empirical evidence on how relevant fitness landscapes actually look like, rather than assuming a given K (that does not capture complexity) fits the empirical realm one is interested in. In biology, the empirical evidence towards the existence of multi-modal landscapes with numerous epistatic interactions continues to increase (33) and in cultural evolution there is also an ongoing discussion about to what degree the biological world resembles NK landscapes (10). Ganco (22) is a rare exception in the organizational literature, when carrying out a comparative analysis of the NK framework with the fitness landscapes of patents in a given industry. However, this is only a single case study and we would need further studies to identify relevant boundary conditions.
The NK model has been remarkably successful in progressing our ability to computationally model search challenges (19). Yet, we argue that moving away from 'armchair speculations' Simon (68) regarding human search behavior and the nature of the problem is paramount, as seemingly innocuous assumptions can drastically change the problem-solving performance. This requires empirical investigations into how empirical landscapes actually look like as well as acknowledging that any assessment of the fitness landscape necessarily needs to take the neighborhood function, i.e. search behavior, into consideration. This path should ideally lead to better simulations that inform our understanding of the empirical world, and are further calibrated by empirical insights.