Perceived optimality of competing solutions to the Euclidean travelling salesperson problem

The Travelling Salesperson Problem (TSP) is a nondeterministic-polynomial hard (NP-hard) combinatorial problem that occurs in a wide range of industrial domains, including logistics, route finding, and computer wiring. Interestingly, despite the problem ’ s inherent computational difficulty, when presented in Euclidean space (ETSP), human participants can produce close-to-optimal solutions in near-linear time. However, when asked to compare and select the most optimum solution from a set of pre-defined competing solution options, participants can struggle. In this study we investigate this paradox by asking participants to compare four closed-loop Euclidean TSP solutions, in order to determine which solution they perceived to have the most optimal tour cost. We hypothesise that the extracted geometric properties have an effect on stimulus selection in a discrimination task (selection or no selection). Accordingly, we extracted four geometric properties from competing stimuli in order to create a perceptual activation function. Predictive analytics demonstrated that a classification model could identify the most optimal solution 97% of the time using the perceptual activation scores alone, yet human participants only correctly determined the most optimal solution 47% of the time. Mixed-effects models suggest that ‘likelihood of stimulus selection ’ can be modelled as a function of the weighted coefficients of competing perceptual activation scores within each trial; however only a small amount of the variance is explained by these perceptual activation scores. Finally, a drift – diffusion model was used to create a theoretical framework of how likelihood of stimulus selection is influenced by competing perceptual activators. Our study highlights a novel way of extracting and analysing the importance of geometric properties that influence ETSP discrimination tasks, and links this analysis to human behaviour when discriminating between competing ETSP solutions.


Introduction
A travelling salesperson might plan to visit a list of cities before returning home.Naturally, the salesperson aims to (a) minimize the distance travelled (i.e., the tour cost), and (b) avoid visiting the same city twice.This classic problem, referred to as the Travelling Salesperson Problem (TSP), might seem simple, however solving this problem is deceivingly complex due to the exponentially increasing number of possible routesdefined by the equation (n − 1)!/2, where n is the number of cities.For example, a list of ten cities has 181,440 possible route solutions, yet adding just one more causes a ten-fold increase in possible routes to 1,814,400.For this type of combinatorial problema nondeterministic polynomial (NP) hard problemexhaustive search algorithms soon fail to produce optimal solutions within a viable time frame.Accordingly, applied algorithmic approaches focus instead on producing tours that are considered to be 'good enough', or 'nearoptimal', with a practical trade-off between processing time (efficiency) and tour length (effectiveness) (Kyritsis, Gulliver, Feredoes, & Din, 2018b).
Interestingly, when presented in Euclidean space (ETSP) on a 2D graph, e.g., with dots/nodes representing cities, humans producing surprisingly highly efficient near-optimal tours in near-linear time (Dry, Lee, Vickers, & Hughes, 2006;Graham, Joshi, & Pizlo, 2000;Kyritsis et al., 2018b;MacGregor & Ormerod, 1996); a finding that also applies to navigational-variants of the TSP, i.e., in real space rather than on Euclidean 2D graphs (Blaser & Wilber, 2013).The lack of any significant increase in processing time, as a result of number of nodes, indicates that the production of ETSP solutions are not determined by local processing (System 2), but as a result of heuristic/global processing (System 1).Human solutions can, at times, even outperform computational heuristics (Kyritsis et al., 2018b), making it pertinent to model human behaviour in order to improve and/or develop faster algorithms that produce near-to-optimal ETSP solutions.
Existing research has identified that human solutions tend to be both void of edge crossings, and tend to display convexity, which has resulted in researchers proposing a plethora of behavioural models such as: global-to-local precedence (e.g., convex hull hypothesis, and the quotient algorithm), local-to-global precedence (e.g., crossing-avoidance hypothesis and pyramidal models), or a mix of both (e.g., generative models) (Kyritsis et al., 2018b;MacGregor & Chu, 2011;MacGregor, Chronicle, & Ormerod, 2004;Pizlo et al., 2006;Van Rooij, Stege, & Schactman, 2003).Despite a differing of opinion concerning which cognitive approach is dominant, there is consensus that several geometric properties, termed 'figural effects', impact human ETSP solution generation performance, e.g., indentations, number of edge crossings, and convexity (Kyritsis, Gulliver, & Feredoes, 2018a;MacGregor & Ormerod, 1996;van Rooij, Schactman, Kadlec, & Stege, 2006).An indentation is a notch on the edge or surface of the route convex hull.MacGregor (2012) investigated whether the number of indentations influenced participant perception of optimality in a discrimination task, i.e., when different TSP solutions were presented in pairs.The results of MacGregor ( 2012) study indicated that, even though participant solutions tend to have fewer number of indentations, participants did not use number of indentations when comparing optimality.Accordingly, there appears to be a discrepancy between the geometric properties and processes used to generate solutions, and those used to judge the optimality of existing tours.Nevertheless, there is very limited research aimed at understanding the figural effects that impact judgement of optimality (tour lengths) of existing tours, i.e., discriminating between different pre-existing solutions (comparing the efficiency of existing solutions for a specific node layout).In this research we hypothesise that extracted geometric properties have an effect on stimulus selection in a discrimination task (selection or no selection).Ormerod and Chronicle (1999) found a linear relationship between aesthetic perception of a ETSP solution and the ratio between internal nodes, i.e. nodes not included in the convex hull and the number of nodes included in the convex hull.This result implies that participants visually prefer ETSP solutions where most nodes sit on the convex hull (e.g.implying that solution shown in Fig. 1 left is preferred over the solution in Fig. 1 right).In contrast, however, Vickers, Lee, Dry, Hughes, and McMahon (2006), whilst controlling for regularity, number of potential intersections, number of internal nodes, and the number of nearest neighbours in each solution, failed to find evidence that the convex hull is an important determinant of aesthetic appeal.Dry and Fontaine (2014) focused on discrimination of tour lengths, and asked participants to choose the easiest ETSP graph to solve from four competing node layouts.The authors concluded that the number of convex hull nodes and the number of possible intersections served as the selection criteria for the discrimination test.This study explored selection between stimuli with a range of inherent geometric propertiesparticularly number of possible intersections, and the number of convex hull nodes.Although helpful in understanding ETSP solution formation, Dry and Fontaine (2014) did not consider participants' ability to compare pre-existing solutions, but instead requested participants to select from a set of inconsistent unsolved layouts.MacGregor (2017) investigated the impact of two geometric properties, i.e., number of crossings (complexity) and area (convexity), on judgement of optimality between competing solutions.The author concluded that edge crossings, and the area of the polygon were significant to the participants' judgement of optimality.Ultimately, solutions with less crossings, and solutions with larger polygon areas, were considered to be more optimal, with edge crossings the most important of the two criteria.Interestingly, the presence of any crossings in a solution, as illustrated in Fig. 2, renders the solution, from the view of a participant, as inherently 'suboptimal'.In the absence of crossings, however, polygon-enclosed area is used as the primary selection criteria, i.e. with a larger convex area representing a more optimal solution.
Since some geometric properties have been shown to correlate (Vickers et al., 2006), it makes sense to compare stimuli as pairs, or groups of solutions, as this can decrease finding uncertainty as a result of unknown and/or uncontrolled variables.For this study, we were interested in exploring the impact of the geometric properties (i.e., regularity, area, tour cost, and path complexity) on judgement of ETSP optimality and ability to discriminate optimality.To satisfy this aim, a data mining and modelling approach was used to extract geometric properties from thousands of participant trials (independent variables) in order to better understand how they affect the process of target selection in a discrimination task (dependent variable), where participants were asked to decide between competing ETSP stimuli based on perceived optimality.

Experimental procedure
The experimental front-end platform was developed using HTML5 and Javascript, whilst PHP and mySQL were used to code the back-end processes and data storage.The platform was tested on Google Chrome, Fig. 1.Two solutions to the same ETSP stimulus (generated using R).The left solution has a surface area of 5213 AU 2 , while the right solution has a surface area of 2817 AU 2 .Participants will typically judge the left solution as being more optimal.Note.AU is arbitrary units and depends on the screen resolution, but the ratio between the area of the solution and the total area remain consistent.Microsoft Edge, and Mozilla Firefox.Data collection was anonymous, i. e., no IPs or contact details were collected.However, to stop participants from repeating the experiment more than once, PHP sessions were used to stop participants who (perhaps accidently) tried to run the experiment multiple times.To allow collection of anonymous data, a unique number was generated and assigned to each participant.Randomly generated ETSP graphs were created using R by generating X and Y coordinate sets from a uniform distribution ranging from 0 to 100, such that X = X1, …, X n ∼ U(0, 100) and Y = Y1, …, Y n ∼ U(0, 100).The set of samples were then combining to create graph nodes G, so that The number of nodes (n) ranged from 30 nodes to 70 nodes (in 10 node steps).Ultimately, 2,000 ETSP graphs were generated for each step increment (i.e., 2000 30 node graphs, 2000 40 node graphs, etc.), making 10,000 graphs in total.For each of the 10,000 graphs, with the help of the TSP library (Hahsler & Hornik, 2007), solutions were generated by applying four commonly used TSP solution heuristics: Nearest Neighbour; Nearest Insertion; Cheapest Insertion; and Farthest Insertion (see Table 1 for details of the four applied heuristics).The starting node was kept constant for all heuristics.The stimuli presented to participants was rendered on four separate HTML5 canvases (see Fig. 3).The solutions were assigned to a cell position randomly, i.e., solutions generated by each of the heuristics had an equal likelihood of being assigned to each cell.A mouse listener was implemented to record participant mouse clicks as a method of logging target selection within each trial.Timestamps were also coded to record response time (RT); where RT = Time end − Time start .
Upon providing consent, i.e., by clicking the "happy to provide consent" checkbox, a short tutorial appeared on the screen explaining the travelling salesperson problem and the experimental process.This was done using a combination of step-by-step descriptions and illustrations across two separate pages.For the sake of transparency, we have added the instructions in Appendix 3. Participants were made aware that all trials were timed and were asked to complete trials as quickly as possible; without compromising the quality of their choices.Once the participants were ready to start, they were asked to click a hyperlink labelled "Start Experiment" at the bottom of the page.
Before each trial, a black cross was presented for two seconds in the middle of a grey screen, i.e., to maximise the chance that all participants started at the same gaze position.The cross was followed by presentation of the four possible solutions (see Fig. 3).Participants were instructed to click on the solution that they deemed to be the most optimalin terms of tour length, i.e., the least distance 'travelled' by the salesperson.The starting/finishing node was highlighted in green on all solutions throughout the experiment.Ten randomly selected trials were presented from each of the five increments, i.e., 50 trials per participant.Upon completion of the experiment, participants were directed to a 'Thank you' page, which asked them to close their browser tab.The entire procedure is summarised in Fig. 4.

Participant sample
The proposed experimental procedure received a favourable ethics review in accordance with the ethical standards of the School of Psychology and Clinical Language Sciences, at the University of Reading (UK).120 participants were recruited using simple random sampling through email and social media.Only participants that had normal or corrected to normal vision were invited to participate in the experimental study.All subjects were explicitly made aware that involvement in the study was voluntary and that they could withdraw at any time by simply closing the browser tab.Consent forms and information sheets were converted to HTML and all participants were asked to confirm that they were happy to participate in the experimental study prior to sitting  through the experiment.A script ensured that participants had ticked the box before allowing them to continue.Gender, age, and other personal data were not recorded for this study.No compensation was provided for participating in this study.

Extraction of figural effects
All data was analysed using R (R Core Team, 2016), the 'BayesFactor' library (Morey, Rouder, & Jamil, 2015), and the 'caret' package (Kuhn et al., 2018).We modelled the discrimination task using selection vs non-selection, as well as success vs failure (i.e., with success defined as the selection of the solution with the most optimal solution) as the two response variables, and the extracted figural effects as the predictor variables.Several studies (e.g., Ormerod & Chronicle, 1999;Vickers et al., 2006;Dry & Fontaine, 2014;MacGregor, 2017) suggested ways to extract tour convexity, tour complexity, and tour circularity; shown to impact human problem-solving when generating solutions for the ETSP.The measure of circularity is closely related to 'compactness', i.e. the area of the polygon over its perimeter, which has been shown to influence aesthetic preference for polygons (Friedenberg and Bertamini, 2015).Therefore, circularity can be extracted quite easily in graphs with no internal crossings (i.e., crossings that occur by routes going back inside the polygon).When considering convexity, we suggest that graphs with internal crossing should be penalised, because crossings can result in areas inside the convex hull that are not considered part of the final graph solution (e.g., Fig. 5).Accordingly, instead of using the term 'convexity' for this measurement, we denote this measurement to be Area pen , since convexity is measured by taking A s /A c (where A s is the polygon-enclosed area of the solution and A c is the area of the convex hull surrounding the solution).
Table 2 summarises the geometric properties that were extracted from the heuristic solutions.Note that since the ETSP stimuli were kept consistent in our experiment, we use the variability in the length of edges as an approximate measure of regularity, rather than the clustering of stimulus (as presented by Dry & Fontaine, 2014).By definition, an irregular polygon will have both unequal edge lengths and angles, however, because of the uncertainty of what constitutes an 'internal angle' for intersecting polygons, we did not consider this measurement.A full description of how these geometric properties were extracted from the graphs has been added to Appendix 1.
In our experimental set, to apply a penalty to solutions that contain internal crossings, we did not consider the internal area created by crossings (as shown in Fig. 5).It is important to clarify that this is a  The Monte Carlo algorithm will penalize these graphs by ignoring the areas created internally (right) Note.The area formed within the polygon-enclosed area is not considered when using the Monte Carlo approach (described in Appendix 1).We suggest that these inner intersections mirror the lack of convexity in these solutions.

Table 2
Description of geometric properties extracted from the TSP solutions.

Properties Method
Areapen a The polygon enclosed area of the solution with a penalty for internal crossings Complexity Number of edge crossings Regularity Variation (SD) of the length of the edges in the solution Cost The total tour cost (i.e., the sum of length of all edges that make up the solution) a The area is an approximation that was extracted using Monte Carlo methods, which introduce a penalty for internal crossings.
numerical method that penalizes irregular solutions that contain internal crossings, and are therefore difficult to measure in terms of convexity.We are not suggesting that the human visual system processes the information in this way.Rather, we suggest that the presence of internal crossings deters participants from considering any solution that contains internal crossings as a possible candidate in the discrimination task; limiting perceptional consideration of area.

Perceptual activation of properties in the discrimination task
Each trial contained four measurements for each of these geometric properties, i.e.: area, tour costs, complexity (number of edge crossings), and regularity (standard deviation of the length of edges in solutions).Since studies suggest that humans approach the Euclidean TSP using global processing, we presume that properties between competing solutions (stimuli) are computed at a perceptual level, i.e., in order to support decision-making in near-linear time.Hence, rather than using absolute measurements of each stimulus separately, the difference in the quantity of geometric properties between each stimulus and its surrounding neighbours (for each trial) was extracted using the formula: where ω 1 is the score of the geometric property of any of the four graphs, and ω 2 , ω 3 , ω 4 are the geometric scores of the remaining graphs in the trial (i.e., its competitors) -see Fig. 6.
In Fig. 6, the top left stimulus has one intersection.Therefore, the 'perceptual activation score' for complexity would be one minus the average number of intersections in the other graphs, giving it the number negative one.This approach allows us to make sense of trials where complexity is equal for all four competing stimuli, i.e., they all have the same number of crossings, and therefore complexity cannot be used as a guidance property.Note, that for regularity, complexity, and tour costs, perceptual activation scores decrease with optimality.Area perceptual activation scores increase as optimality increases (see Fig. 7).

Human performance in the discrimination task
The point-estimated probability of a participant selecting the most optimal route was 0.47 [Bootstrapped 95% Confidence Intervals from 5000 samples: 0.43 < μ < 0.51].Interestingly, existing work in the field (see MacGregor, 2017) placed the probability of successfully discriminating competing tours as being higherbased on complexity (i.e., crossings) at 66%, and on area at 91%.It seems unwise to ignore or dismiss these existing findings.Accordingly, we used a Bayesian binomial analysis and took the rounded mean of the two readings as the maximum likelihood estimate of a beta prior, i.e., θ ∼ Beta (16,4).In turn, we calculated the likelihood of occurrence, i.e., (p(D|θ)); and the combined knowledge was used to generate a posterior distribution using Markov Chain Monte Carlo (2 chains, 20,000 iterations), i.e., p(θ|D)∝p(D|θ)p(θ).Our results suggest that the maximum likelihood estimate for the posterior θ = 0.54, with a 95% credible interval ranging from 0.46 to 0.63.Regardless, our findings are much less optimistic than findings from other studies.
Existing studies propose that several figural effects from Euclidean graphs impact human performance when generating solutions to the ETSP.These figural effect include: distance and variability between the nodes that form the convex hull of the graphs; the number of nodes that make up the convex hull; number of potential intersections in the graph; clustering of the graphs; and the variability of the angles of the convex hull (Dry, Preiss, & Wagemans, 2012;Kyritsis et al., 2018;Vickers, Lee, Dry, & Hughes, 2003).MANOVA with the aforementioned figural effects set as dependent variables and success set as the independent variable, which is a binomial variablei.e., yes/no, indicated that there was an overall significant difference in some of these properties when comparing successful and failed trials [F(6,4223) = 9.66, p < 0.001, V = 0.01].Independent ANOVAs, followed by Tukey HSD as a post-hoc test for multiple comparisons indicated that the following measures were significantly higher in successful vs failed trials: distance of nodes in the convex hull of the graph (Mdiff = 1.34,CI 95% [0.35, 2.35], p < 0.001), variability (standard deviation) of angles in the convex hull (Mdiff = 1.13,CI 95% [0.50, 1.77], p < 0.001), and clustering of the graph (Mdiff = 3.03, CI 95% [2.03, 4.04], p < 0.001).However, it is worth noting that in all cases the effect size difference, calculated using Cohen's d, was negligible: d = 0.08, d = 0.11, and d = 0.18, respectively.Therefore, we found evidence that suggests that some of the figural effects that have been previously shown to impact human performance when generating TSP solutions in Euclidean space, also have a negligible effect on human performance when discriminating between existing solutions.

Likelihood of selection as a function of perceptual activation scores
In every trial, participants selected one graph that they perceived to be the most optimal TSP solution.Regardless of whether the selection was 'correct', i.e., was indeed the most optimal graph, we expect that the perceptual activation scores of the figural effects considered in this study influenced that choice.Stimulus selection can be thought of as a binomial dependent variable.Accordingly, we examined the likelihood of a specific stimulus being either selected (or not) as a function of the weighted influence of its associated perceptual activation scores.Since the experiment relied on a repeated measures design, a generalised linear mixed model was implemented by applying the 'lme4 ′ package (Bates, Mächler, Bolker, & Walker, 2014) in the form: where ID is the random effects variable for participant ID, tID is the trial ID that varied for each participant, and pArea, pComplexity, pRegularity, and pCost are the perceptual activation scores for respectively area pen , number of edge crossings, standard deviation of edges, and tour costs.
For model comparisons, predictors from the model were removed in descending order of standardised coefficients.To evaluate how much the model suffered at each step, we compared the delta of BIC (Bayesian Information Criterion) to the best fitting model.This in turn allowed us to evaluate model fits using Bayes Factors analysis, which Wagenmakers (2007) suggested could be extracted from ΔBIC using: Comparisons of each model, including the predictors, BIC scores, Bayes Factors, marginal R 2 , and conditional R 2 can be found in Table 3.
Marginal and Conditional R 2 were extracted with the help of the 'MuMIn' package (Barton, 2020).Frequentist analysis suggested that graph volume (i.e., the number of nodes) significantly impacts the chance for successfully selecting the most optimal target, however the effect size was small [χ 2 (4) = 22.58, p < 0.001].The independent multinomial Bayes Factor analysis resulted in BF 10 = 1.19, which, according to Kass and Raftery (1995), is not sufficient evidence to support independence of the groups; i.e., there does not appear to be an association between graph volume and the chance of successfully selecting the optimal solution.
Model comparisons using Bayes Factors indicates that Model three, which uses the perceptual activation scores of area, complexity, and cost (see Table 3) was the best fitting model.The next best model (i.e., four) included consideration of regularity, was ~ 118 times less likely to fit the data.According to Kass and Raftery (1995) our result is decisive evidence in favour of model three, and provides credible evidence that all perceptual activations scores, with the exception of regularity, played a key part in driving stimulus selection.
The standardised coefficients (a measure of variable importance), the z values, and the p-values of the best fitting model can be found in Table 4, which show that area, when considering the penalty term for internal edge crossings, is more meaningful in determining likelihood of  selection than edge crossings alone.
Table 3 shows that fixed effects accounted for 14% of the variance in the stimulus selection, i.e., the marginal variance R 2 m, while random effects accounted for an extra 6% of the variance.The model, however, suffered from correlation issues (see Table 5), yet the model did not suffer from major multicollinearity issues (as all VIF scores were under 5).This result indicated that the perceptual activation scores were at the very least not linear combinations of each other, and, therefore, showed fair independence.
The logistic model provides evidence to partially support our initial hypothesis, i.e., that the perceptual activation scores of the extracted geometric properties have an effect on stimulus selection in a discrimination task (selection or no selection).However, the model is not very informative with respect to the process of making a choice (selection or no selection), and how reaction time (RT) is affected by competing perceptual activation scores.

Modelling selection and reaction time using a drift-diffusion model
Selection process, and the impact on RT, can be considered using diffusion models.A diffusion model belongs to the family of sequentialsampling models, and is often applied to the analysis of discrimination tasks that contains two or more choicesfor a comprehensive review of diffusion models see Ratcliff and Smith (2004) or Wagenmakers (2009).Diffusion model theory states that as participants are presented with stimuli (cues), they accumulate the cues presented to them, process the cues, and ultimately make a choice from selections A or B (Alexandrowicz, 2020).The process itself, however, is noisy and stochastic, often simulated as a random-walk, with drift parameters rather than constant slope terms (i.e., drift-diffusion models).This is reasonable as (i) cognitive tasks are not guaranteed to lead to successeven in the most informative environments, (ii) decisions may vary, even for trials with the same conditionsand even in repetitions by the same participant, and (iii) there is variation in the time taken to complete the taskeven when participants repeat the same task (Smith & Ratcliff, 2015).Drift-diffusion models return several parameters, which help the process of decision-making to be understood.The main parameters that we consider in the analysis are discussed in more detail by Voss, Rothermund, and Voss (2004) and are summarised in Table 6.
The unstructured nature of our experimental design meant that reverse consideration of trials was required, i.e., identification of trials where only one perceptual activation score differed significantly compared to their competitors.As there was not enough information about the competitor stimuli to allow comparisons between them, we consider the outcome of each trial as being binomial: either a graph was selected by the participant or it was not.In the end, we isolated several trials that allowed consideration of conditions as shown in Table 7.
Next, to ignore non-decision time, which was not factored into our experimental designsee Verdonck and Tuerlinckx (2016) for a description, we applied the D*M drift-diffusion model method; applying the 'DstarM' package in R (van den Bergh, Tuerlinckx, & Verdonck, 2020) to extract the parameters highlighted in Table 6.To support model building, the data was split to ensure balance, i.e., resulting in

Table 6
Description of the three main parameters returned by drift-diffusion models.

Parameter Description v
The mean drift rate that influences the direction that a process would take towards boundary A or B a The distance between the thresholds of the decision boundaries of two choices: A and B z The starting point of the process.This is indicative of bias towards one decision over the other.
Note.For a visual illustration of how the parameters affect decisions in our choice discrimination task see Fig. 8.

Table 7
Conditions used as inputs in the drift diffusion model.Note.Regularity has a negative slope, which implies that it is not a strong enough perceptual activator to drag away from the default diffusion towards nonselection (as shown in the intercept term).Note.Bayes Factors were taken in relation to the most likely fit (i.e., the lowest BIC score), as shown by Dry et al. (2006).In this case, model two is 1510 times less likely to fit the data than model three.
consideration of only (i) the selected stimuli, and (ii) an equal number of non-selected stimuli (i.e., one of the non-selected stimuli from each trial).Moreover, only trials that lasted less than seven seconds were considered, as this delay is in line with expected results from a global processing task.In total there were 2114 trials.The parameters for each condition is summarised in Table 8.Fig. 8 shows, for the sake of demonstration, a hypothetical trial using the average slopes of each condition in a random walk simulation.
Both D*M drift-diffusion and logit models show that the effect on selection follows the order: Area pen , Cost, and then Complexity.As such, the models have illustrated two things: firstly, that perceptual activation scores of geometric properties work as weighted inputs of a function that returns the likelihood of stimulus selection in the discrimination task, and secondly, that perception of area, with the associated penalty term for internal crossings, seems to be particularly influential in guiding human selection.

Likelihood of success as a function of perceptual activation scores
To validate our extracted properties as drivers for determining optimality in the discrimination task, we used a predictive analytics approach.It is already well established that the most optimal solution will never contain edge crossings, however, less is known about the properties of regularity and the area of the enclosed polygons.In this case, it would not make sense to include tour costs in our model, since the most optimal solution will have the lowest tour costs.A logistic model was fit, i.e., to evaluate whether perceptual scores can make any meaningful prediction of the likelihood of a stimulus being successful (note that we did not account for random effects, since no generalisations about human behaviour were made in this section).
The results of the full model show that all three perceptual activation scores were significant predictors of the likelihood of a stimulus being successful (see Table 9 for model comparisons).As before, variables were removed in order of least importance from the full model, i.e., according to their standardised coefficient scores.
We were interested in evaluating how this model would perform on unseen data.To provide a proof of concept, the dataset was split into a training and a test set (80/20 split) -with the help of the 'crea-teDataPartion()' function found in the 'caret' library (Kuhn et al., 2018) and was used with the full model to predict the probability of unseen trials being classified as successful.The Receiver Operating Characteristic (ROC) curve (shown in Fig. 9) illustrates that a suitable cut-off (trade-off between sensitivity and specificity) is ~ 0.7.
The ROC curve indicated that the probability of an observation being classified as a success is higher than 0.7.More formally, where ŷi are the fitted probabilities of the logit model on previously unseen data (i.e., the predictions on the test set).Applying the cut-off of at 0.7 -as shown in (v) -leads to the confusion matrix found in Table 10.The most interesting finding from this confusion matrix is the very high positive predictive value, which is the probability of a successful trial given a positive prediction, i.e., P(Success|Positive) = P(Success AND Positive) The result shows that, using just these three variables, our model can accurately predict 96% of the time, whether a stimulus is likely to be the most optimal solution.Interestingly, however, participants did not perform as well as the regression model in the task, which suggests there are other factors influencing humans when selecting the most optimal from multiple pre-defined ETSP solutions.

Discussion
In this study we presented ETSP solutions (with node volume between 30 and 70) to participants in an online experiment.Participants were asked to select the one most optimum solution, i.e., the stimulus with the shortest tour length in a discrimination task between four competing stimuli (ETSP solutions).Solutions were produced using common TSP heuristics (Nearest Neighbour; Nearest Insertion; Cheapest Insertion; and Farthest Insertion) and were randomly positioned on a 2x2 grid.Data mining was used to extract geometric properties, which  literature suggests would aid human selection in an ETSP solution discrimination task, and aid human selection response time.The extracted properties were complexity (edge crossings), polygon area with an associated penalty term for internal crossings, regularity (SD of edge lengths), and tour costs.Complexity was extracted from stimuli through a system of two linear equations (see xix in Appendix 1).Area of the enclosed polygon was extracted using the Monte Carlo method (see formulas vi to xvi in Appendix 1).Regularity and tour costs were extracted by measuring edge lengths, and in the case of regularity the standard deviation of all the lengths of edges in a solution (see formula xxvi in Appendix 1).Finally, the properties were used as inputs in a perceptual activation function, which returned the difference between the properties in competing stimuli, rather than treating them as absolute values.Admittedly, one of the limitations of our study was that only a small set of properties that impacted target selection were investigated, most of which had been previously discussed in literature (with the exception of regularity); i.e., we are not presenting an exhaustive list of variables.
Participants performed quite poorly in this task (point estimate of 47% success rate), and success was only negligibly influenced by the usual properties often shown to occur in the formation of participant solutions (such as convex hull parameters).Poor participant performance contrasts with the more optimistic results found by MacGregor (2017), however, the difference in results were speculative due to (a) the noisy nature of online experiments, and (b) that our competing solutions were generated on the same graph using four different heuristics, none of which are 'state of the art'.It is therefore possible that participants performed more poorly than anticipated due to all solutions simply being too underwhelming.
Mixed-effects linear modelling was used to evaluate whether the probability of stimulus selection can be expressed as a function of perceptual activation scores.Our findings suggest that differences in polygon area, complexity, and tour costs explain a small, but significant amount of the variance in participant selections in the discrimination task, with about half of that attributed to conditional / random effects.Regularity did not predict likelihood of selection, despite being a significant predictor of optimality when used as an input in a logit classification model.Furthermore, the standardised effect sizes indicate that the perceptual activation score from the polygon area measurement, when accompanied by a penalty term for overlapping areas caused by internal crossings, was the most important cue for target selection, followed by complexity, and then tour costs.This finding was reinforced by our drift-diffusion model, which also showed a larger drift-rate for area, followed by complexity, and finally cost (see Fig. 8 for the output of a random walk simulation given the parameters of the drift-diffusion model).It is worth noting that we were forced to set a cut-off RT for this model at seven seconds to be more in-line with what one would expect from a global processing task.This resulted in a loss of about half of the trials and a significant drop, therefore, in statistical power.However, the drift-diffusion model is an early theoretical framework for modelling the process involved in reaching a decision (selection between competing stimuli) boundary in the task through accumulation of evidence (i.e., perceptual activation scores); though this needs to be validated in a lab setting whilst testing for other properties that impact judgement of optimality in this discrimination task.
As mentioned earlier, the area of graphs in our study, when using the Monte Carlo method, was penalised by the presence of internal crossings (i.e., crossings that occur by routes that go back inside the polygon itself, as shown in Fig. 5).Our data suggests that people are averse to selecting these types of graphs when judging optimality, more so than other solutions that contain external crossings.As an ad hoc follow-up to these findings, we asked ten participants to judge two solutions, one with internal crossings, one with external crossings, while consistently maintaining all other properties.The emerging codings identified that solutions with internal crossings were perceived as being "messy", "intertwining", "back-and-forth", and "looping".As expected, all participants described solutions with internal crossings as being less optimal.Our anecdotal evidence suggests that, despite the presence of crossings being a well-known factor that impacts human performance in the TSP, for the discrimination task this can be further broken down into internal and external crossings.From a perceptual viewpoint, we believe the presence of internal crossings more strongly activates the diffusion process towards the non-selection boundary than external crossings.We aim to explore the reason for this further in future work.
Our results are consistent with the work of MacGregor (2017), who argued that both complexity (edge crossings) and area are important cues for discriminating between ETSP tour lengths.Furthermore, we expanded the area measurement to include a penalty for internal crossings as to not over-bias the estimated area that these solutions would otherwise create.
Moreover, our results agree with previous findings, which suggest that geometric properties in the ETSP task correlate with each other (Vickers et al., 2006), which, as MacGregor (2007) stated "suggests a nexus of stimulus characteristics that people could use to judge the lengths of tours".Our correlation matrix (Table 5) shows that properties were significantly correlated; however, this is practically not surprising when altering one will subsequently alter the others.It is important to note that our logistic model did not indicate serious multicollinearity problems (all VIFs < 5), which at least leads us to believe that the perceptual activation scores were fairly independent, and not just linear combinations of each other.
Finally, a logit model, with success as the binomial response variable and perceptual activation scores as the predictors can be used to detect the most optimal solution 96% of the time in the discrimination task, which shows that extracted geometric properties (i.e., area, complexity, and regularity) can be used effectively to distinguish between ETSP solutionshowever there are other factors impacting human trials.

Conclusion
Studies have consistently shown that some humans can outperform even the best commonly used heuristics when solving ETSP graphs (see Kyritsis et al., 2018b).Although humans are good at ETSP problem solving, they performed poorly in a discrimination task where they were asked to judge between four competing ETSP stimuli and select the most optimal solution.Our initial probability of success (p(success) ~ 0.47) was much lower than the rate presented by MacGregor (2017) (0.66 to 0.91), and our MCMC Binomial Bayesian analysis, using the information provided by MacGregor as a prior, showed a likelihood estimate of the participant selecting the most optimum solution as being 0.54 (95% credible interval [0.46, 0.63]).
We mined geometric properties from the ETSP solutions generated for our experiment and used them in two models (a mixed-effects model, and a drift-diffusion model) to both evaluate the importance of the variables in predicting human discrimination of optimality between competing tours, and to assess to what extent measures of area, complexity, regularity, and tour costs can be used as predictors of stimulus selection.The study looked at the effect of measures when predicting optimality amongst competing stimuli.Interestingly, although our models can accurately predict the most optimal stimulus in trials 96% of the time (on previously unseen data), the same geometric properties only explain a very small amount of the variance in participant selections (~14%), with another 6% being down to random effects.Ultimately there is definitely more 'going on' here, which we aim to investigate in future work.To this end, we have made our data and our scripts available online for the sake of replicability, as well as to generate additional interest in the area (see Appendix 2 for the URL).
The results from the diffusion-drift models was promising, and we certainly plan to use this method more in future studies, however much work is needed to fully identify the factors impacting human selection between pre-existing ETSP solutions.We aim to undertake future research that uses the fact that all geometric properties correlate, to explore the effects on human selection in a more controlled experiment.Moreover, consideration of personal, information assimilation, and individual difference factors are planned in future work.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.the intersection point (x I , y I ) is determined.This works well in most cases, however, there are two exceptions to this rule.The first condition, as shown in (vii), is if the edge projects a vertical line, then there will be no slope, i.e., x(G 2 ) = x(G 1 ).This happens more regularly when G ∈ N 2 , as was the case with our graphs.In these cases, x I is set as and y I as The second condition is that the slope of the edge cannot be equal to 0, shown in (ix), i.e., the edge cannot be horizontal.If that is the case, then the result would be undefined, since you would have parallel lines that would never intersect.Algorithmically, one solution is to return a non-meaningful coordinate, e.g., containing negative numbers, which is outside the bounds of our problem space.
Implementation of the above algorithm will return all intersections with the horizonal line, however not all intersections are meaningful, since we are only interested in finding the intersections within the bounds of the edges themselves (not their linear projections).To check if an intersection is within the bound of the TSP problem space, an indicator function can be used.Suppose the intersection point I = (x I , y I ), then, Algorithmically, optimisation of the process can be achieved determining whether a graph point is within the polygon by checking intersections to either the left (x(P) < b/2) or the right (x(P) ≥ b/2); where b is the maximum number from the uniform distribution of points that were used to generate the graph (i.e., 100).By extending this iteratively to all edges in the graph the result can be determined (see Fig. A3).
If the number of intersections is even, then the point must lay outside the polygon.If the number of interceptions is odd, the point is inside the polygon.Formally, assigning that number to a variable With our final indicator function, the modulus of v is returned, which will inform us whether the point is inside or outside the polygon, i.e.
The procedure can be used with an arbitrarily large number of points, allowing us to determine the number of points that fall within the polygon using the function o(v).In other words, if we were to create n number of points, and run process steps (vi) to (xiv) n number of times, each time updating x(P) and y(P), then the sum of the o(v) outputs can be used to assign a value representing number of points inside the polygon, (xv) As we increase n, results tend towards a better estimate of the area of the polygon by taking the ratio where w is a function of τ and n, and returns an estimate of the true area of the polygon, which can be denoted as θ.Naturally, lim n→∞ w(τ,n) = θ, however, this iterative process (i.e., our Monte Carlo method), is computationally expensive.For one forty-node graph, it took the University Academic Cluster approximately four seconds to compute w when n = 5000, eight seconds when n = 10,000, and 15 s when n = 20,000.Interestingly, there is a point of saturation, i.e., a compromise between accuracy and speed.We suggest that a good value for n is 20,000, which is approximately when the difference in output precision is equivalent at two decimal places (see Table A1).Finally, as discussed earlier, we suggest that areas for solutions that contain internal crossings should be penalized.Our approach penalizes these graphs by not considering the internal area created by crossings (as shown in Fig. 5 of the main text).

A.2. Calculating the number of edge crossings in solutions
The graphs are made up of nodes, which when connected form edges.These edges are straight lines and can be projected to follow the usual linear equation y = mx +c by getting the slope and the intercept as shown in steps (vii) and (viii).To check if two edges in the graph intersect, one would need to (a) find where the intersection is between the two lines, followed by (b), check if that intersection falls within the boundary space of the edges.Finding the intersections is another iterative process that requires the checking of every edge against every other edge.This can be expressed using two functions.The first function checks where the linear projections of the lines intersect using a system of two equations In the case of two vertical lines, two horizontal lines, or horizontal and vertical lines, we need to make exceptions.In the first two cases, our function would be undefined, so algorithmically coordinates that are outside the problem space can be returned as a negative number.In the last case the steps shown in (x) and (xi) would be followed.
As an example, let us consider 0.522 0.481 Fig. A4.Graph containing no crossings (left), and crossings (right).The area is extracted using a Monte Carlo method and is simply the ratio of points inside the polygon to the total number of points.

Table A1
Examples of differences in Area pen estimates derived by the function w(s, n).Note.The Graph ID identifies solutions in our dataset, which are made available on Github (see Appendix 2, for the URL).Also, the ratio measurements of area are point estimates, and can differ slightly due to the stochastic nature of Monte Carlo.The algorithm works well for graphs with, and without crossings in most cases, as shown in Fig. A4.In this case, our analysis shows that the tour on the bottom-left is the most optimal in this set, since it requires less travelling.We are interested in how good you are at 'guessing' which tour is most optimal just by looking at four possible solutions.
Page 2 The process The experiment consists of 50 trials.Each trial will start with a crosshair After a second or so, you will be presented with four tours for a random graph.It's up to you to decide which of those you think is the best tour.In other words, is the tour that the salesperson travelled the least distance Each trial is timed, so try your best to complete it is as quickly as possible, without compromising the quality of your decision.

Fig. 3 .
Fig. 3. Four heuristic solutions to the same randomly generated ETSP graph were coded to appear in each trial.The green shaded node is the start/finish node.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 5 .
Fig.5.Graph contains internal edge crossings (left).The Monte Carlo algorithm will penalize these graphs by ignoring the areas created internally (right) Note.The area formed within the polygon-enclosed area is not considered when using the Monte Carlo approach (described in Appendix 1).We suggest that these inner intersections mirror the lack of convexity in these solutions.

Fig. 6 .Fig. 7 .
Fig. 6.Crossings are highlighted (i.e., top-left graph has one crossing, and bottom-right has five).For each stimulus, perceptual activation for a geometric property is derived by subtracting the quantity of said property in the stimulus by the average quantity of its neighbours.

Fig. 8 .
Fig. 8. Random walk simulation using the average parameters of the drift-diffusion model.Note.Τhe drift-rate for the area perceptual activation score is much steeper than the other two activators, further validating our mixed-effects model, that showed perceptual score of area, with the associated penalty term, to have the strongest effect on target selection.

Fig. 9 .
Fig. 9.The Receiver Operating Characteristic curve highlights the best trade-off between sensitivity and specificity (which is the cut-off at the top-left corner of the curve).
Fig. A2.A horizontal line 'drawn' on the point P, so that y I = y(P).
Fig. A5.Two linear projections of edges will cross unless they are parallel lines.

Table 1
Description of heuristics used to produce the four solutions in the experiment.

Table 3
Dry et al. (2006) of four logistic models.BIC was used as the criterion for retention of variables that predict stimulus selection.Note.Bayes Factors were taken in relation to the most likely fit (i.e., the lowest BIC score), as shown byDry et al. (2006).Model 3 was the best fitting model (highlighted).Standardised coefficients and z values for the logistic mixed effects model.

Table 5
Correlation table of fixed effects of the full model (all variables).

Table 8
Parameters indicate the average starting bias, drift-rate, and selection boundary width for selection vs non-selection in our discrimination task according to the D*M drift-diffusion model.

Table 9
Regression modelling using BIC as the criterion for retention of variables that predict the likelihood of a successful trial.

Table 10
Confusion matrix of the predictive logit model shown in (v) with a cut-off of 0.7.