Visual Diagnostics for Constrained Optimisation with Application to Guided Tours

A guided tour helps to visualise high-dimensional data by showing low-dimensional projections along a projection pursuit optimisation path. Projection pursuit is a generalisation of principal component analysis, in the sense that different indexes are used to deﬁne the interestingness of the projected data. While much work has been done in developing new indexes in the literature, less has been done on understanding the optimisation. Index functions can be noisy, might have multiple local maxima as well as an optimal maximum, and are constrained to generate orthonormal projection frames, which complicates the optimization. In addition, projection pursuit is primarily used for exploratory data analysis, and ﬁnding the local maxima is also useful. The guided tour is especially useful for exploration, because it conducts geodesic interpolation connecting steps in the optimisation and shows how the projected data changes as a maxima is approached. This work provides new visual diagnostics for examining a choice of optimisation procedure, based on the provision of a new data object which collects information throughout the optimisation. It has helped to diagnose and ﬁx several problems with projection pursuit guided tour. This work might be useful more broadly for diagnosing optimisers, and comparing their performance. The diagnostics are implemented in the R package ferrn .


Introduction
Visualisation is widely used in exploratory data analysis (Tukey, 1977;Unwin, 2015;Healy, 2018;Wilke, 2019). Presenting information in graphics often unveils insights that would otherwise not be discovered and provides a more comprehensive understanding of the problem at hand. Task specific tools such as Li et al. (2020) show how visualisation can be used to understand, for instance, the behaviour of the optimisation for the example of neural network classification models. However, no general visualisation tool is available for diagnosing optimisation procedures. The work presented in this paper brings visualization tools into optimisation problems with the aim to better understand the performance of optimisers in practice. the problem consists in finding the minimum or maximum of a function f ∈ L p in the constrained space A, where L p defines the vector space of function f , whose pth power are integrable.
Gradient-based methods are commonly used to optimise an objective function, with the most notable one being the gradient ascent (descent) method. Although these methods are popular, they rely on the availability of the objective function derivatives. As will be shown in the next section, the independent variables in our optimisation problem are the entries of a projection matrix, and the computational time required to perform differentiation on a matrix could impede the rendering of tour animation. In addition, some objective functions rely on the empirical distribution of the data, which makes it in general not possible to get the gradient. Hence, gradient-based methods are not the focus of this paper and consideration will be given to derivative-free methods.
Derivative-free methods (Conn et al., 2009;Rios and Sahinidis, 2013), which do not rely on the knowledge of the gradient, are more generally applicable. Derivative-free methods have been developed over the years, where the emphasis is on finding, in most cases, a near optimal solution. Here we consider three derivative-free methods, two of which are random search methods: creeping random search and simulated annealing, and the other one is pseudo-derivative search.
Random search methods (Romeijn, 2009;Zabinsky, 2013;Andradóttir, 2015) have a random sampling component as part of their algorithm and have been shown to have the ability to optimise non-convex and non-smooth functions. The initial random search algorithm, pure random search (Brooks, 1958), draws candidate points from the entire space without using any information of the current position and updates the current position when an improvement on the objective function is made. As the dimension of the space becomes larger, sufficient sampling from the entire space would require a long time for convergence to occur, despite a guaranteed global convergence (Spall, 2005). Various algorithms have thus been developed to improve pure random search by either concentrating on a narrower sampling space or using a different updating mechanism. Creeping random search (White, 1971) is such a variation, where a candidate point is generated within a neighbourhood of the current point. This makes creeping random search faster to compute, however, global convergence is no longer guaranteed. On the other hand, simulated annealing (Kirkpatrick et al., 1983;Bertsimas and Tsitsiklis, 1993), introduces a different updating mechanism. Rather than only updating the current point when an improvement is made, simulated annealing uses a Metropolis acceptance criterion, where worse candidates still have a chance to be accepted. The convergence of simulated annealing algorithms has been widely researched (Mitra et al., 1986;Granville et al., 1994) and the global optimum can be attained under mild regularity conditions. Pseudo-derivative search uses a common search scheme in optimisation: line search. In line search methods, users are required to provide an initial estimate x 1 and, at each iteration, a search direction S k and a step size α k are generated. Then one moves on to the next point following x k+1 = x k + α k S k and the process is repeated until the desired convergence is reached. In derivative-free methods, local information of the objective function is used to determine the search direction. The choice of step size also needs consideration, as inadequate step sizes might prevent the optimisation method to converge to an optimum. An ideal step size can be chosen by finding the value of α k ∈ R that maximises f (x k + α k S k ) with respect to α k at each iteration.

Projection pursuit guided tour
A projection pursuit guided tour combines two different methods (projection pursuit and guided tour) to explore interesting features in a high-dimensional space. Projection pursuit, coined by Friedman and Tukey (1974), detects interesting structures (e.g. clustering, outliers and skewness) in multivariate data via low-dimensional projections. Guided tour (Cook et al., 1995) is one variation of a broader class of data visualisation methods, tour (Buja et al., 2005), which displays high-dimensional data through a series of animated projections. Let X n×p be the data matrix with n observations in p dimensions. A d-dimensional projection is a linear transformation from R p into R d defined as Y = X · A, where Y n×d is the projected data and A p×d is the projection matrix. We define f : R n×d → R to be an index function that maps the projected data Y onto a scalar value. This is commonly known as the projection pursuit index function, or just index function, and is used to measure the "interestingness" of a given projection. An interesting projection shows structures that are non-normal since theoretical proofs from Diaconis and Freedman (1984) have shown that projections tend to be normal, as n and p approach infinity under certain conditions. There have been many index functions proposed in the literature, here are a few examples: early indexes that can be categorised as measuring the L 2 distance between the projection and a normal distribution: Legendre index (Friedman and Tukey, 1974), Hermite index (Hall, 1989), and natural Hermite index (Cook et al., 1993); chi-square index (Posse, 1995) for detecting spiral structure; LDA index (Lee et al., 2005) and PDA (Lee and Cook, 2010) index for supervised classification; kurtosis Figure 1: An illustration for demonstrating the frames in a tour path. Each square (frame) represents the projected data with a corresponding basis. Blue frames are returned by the projection pursuit optimisation and white frames are constructed between two blue frames by geodesic interpolation.
As a general visualisation method, tour produces animations of high-dimensional data via rotations of low-dimensional planes. There are different versions depending on how the high-dimensional space is investigated: grand tour (Cook et al., 2008) selects the planes randomly to provide a general overview; manual tour (Cook and Buja, 1997) gradually phases in and out one variable to understand the contribution of that variable in the projection. Guided tour, the main interest of this paper, chooses the planes with the aid of projection pursuit, to gradually reveal the most interesting projection. Given a random start, projection pursuit iteratively finds bases with higher index values and the guided tour constructs a geodesic interpolation between these planes to form a tour path. Figure 1 shows a sketch of the tour path where the blue squares represent planes (targets) selected by the projection pursuit optimisation and the white squares represent planes in the geodesic interpolation between targets. Mathematical details of the geodesic interpolation can be found in Buja et al. (2005). (Note that the term frame used in Buja's paper to refer to a particular set of orthonormal vectors defining a plane. This is also conventionally referred to as a basis, which is used in this paper and the associated software.) The aforementioned tour method has been implemented in the R package tourr (Wickham et al., 2011).

Optimisation in the tour
In projection pursuit the optimisation aims at finding the global and local maxima that give interesting projections according to an index function. That is, it starts with a given randomly selected basis A 1 and aims at finding an optimal final projection basis A T that satisfies the following optimisation problem: where f and X are defined as in the previous section, A is the set of all p-dimensional projection bases, I d is the d-dimensional identity matrix, and the constraint ensures the projection bases, A, to be orthonormal. It is worth noticing the following: 1) The optimisation is constrained and the orthonormality constraint imposes a geometrical structure on the bases space: it forms a Stiefel manifold. 2) There may be index functions for which the objective function might not be differentiable.
3) While finding the global optimum is the goal of the optimisation problem, interesting projections may also appear in the local optimum. 4) The optimisation should be fast to compute since the tour animation is viewed by the users during the optimisation.

Existing algorithms
Three optimisers have been implemented in the tourr (Wickham et al., 2011) package: creeping random search (CRS), simulated annealing (SA), and pseudo-derivative (PD). Creeping random search (CRS) is a random search optimiser that samples a candidate basis A l in the neighbourhood of the current basis A cur by A l = (1 − α)A cur + αA rand where α ∈ [0, 1] controls the radius of the sampling neighbourhood and A rand is generated randomly. A l is then orthonormalised to fulfil the basis constraint. If A l has an index value higher than the current basis A cur , the optimiser outputs A l for guided tour to construct an interpolation path. The neighbourhood parameter α is adjusted by a cooling parameter: α j+1 = α j * cooling before the next iteration starts. The optimiser terminates when the maximum number of iteration l max is reached before a better basis can be found. The algorithm of CRS can be found in the appendix. Posse (1995) has proposed a slightly different cooling scheme by introducing a halving parameter c. In his proposal α is only adjusted if the last iteration takes more than c times to find a better basis.
Simulated annealing (SA) uses the same sampling process as CRS but allows a probabilistic acceptance of a basis with lower index value than the current one. Given an initial value of T 0 ∈ R + , the "temperature" at iteration l is defined as T(l) = T 0 log(l+1) . When a candidate basis fails to have an index value larger than the current basis, SA gives it a second chance to be accepted with probability where I (·) ∈ R denotes the index value of a given basis. This implementation allows the optimiser to make a move and explore the basis space even if the candidate basis does not have a higher index value and hence enables the optimiser to jump out of a local optimum. The second algorithm in the appendix highlights how SA differs from CRS in the inner loop.
Pseudo-derivative (PD) search uses a different strategy than CRS and SA. Rather than randomly sample the basis space, PD first computes a search direction by evaluating bases close to the current basis. A step size is then chosen along the corresponding geodesic by another optimisation over a 90 degree angle from −π/4 to π/4. The resulting candidate basis A * * is returned for the current iteration if it has a higher index value than the current one. The third algorithm in the appendix summarises the inner loop of the PD.

Visual diagnostics
A data structure for diagnosing optimisers in projection pursuit guided tour is first defined. With this data structure, four types of diagnostic plots are presented.

Data structure for diagnostics
Three main pieces of information are recorded during the projection pursuit optimisation: 1) projection bases A, 2) index values I, and 3) state S. For CRS and SA, possible states include random_search, new_basis, and interpolation. Pseudo-derivative (PD) has a wider variety of states including new_basis, direction_search, best_direction_search, best_line_search, and interpolation. Multiple iterators index the information collected at different levels: t is a unique identifier prescribing the natural ordering of each observation; j and l are the counter of the outer and inner loop respectively. Other parameters of interest recorded, V, include method that tags the name of the optimiser, and alpha that indicates the sampling neighbourhood size for searching observations. A matrix notation of the data structure is presented in the table below.
Note that there is no output in iteration J + 1 since the optimiser does not find a better basis in the last iteration and terminates. The final basis found is A T with index value I T .
The data structure constructed above meets the tidy data principle (Wickham, 2014) that requires each observation to form a row and each variable to form a column. With tidy data structure, data wrangling and visualisation can be significantly simplified by well-developed packages such as dplyr (Wickham et al., 2020) and ggplot2 (Wickham, 2016).

Diagnostic 1: Checking how hard the optimiser is working
A starting point of diagnosing an optimiser is to understand how many searches it has conducted, i.e. we want to summarise how the index is increasing over iterations and how many basis need to be sampled at each iteration. This is achieved using the function explore_trace_search(): a boxplot shows the distribution of index values for each try, where the accepted basis (corresponding to the highest index value) is always shown as a point. In the case of few tries at a given iteration, showing the data points directly may be preferred over the boxplot, this is controlled via the cutoff argument. Additional annotations are added to facilitate better reading of the plot and these include 1) the number of points searched in each iteration can be added as text label at the bottom of each iteration; 2) the anchor bases to interpolate are connected and highlighted in a larger size; and 3) the colour of the last iteration is in a greyscale to indicate no better basis found in this iteration. Figure 2 shows an example of the search plot for CRS (left) and SA (right). Both optimisers quickly find better bases in the first few iterations and then take longer to find one in the later iterations. The anchor bases, the ones found with the highest index value in each iteration, always have an increased index value in the optimiser CRS while this is not the case for SA. This feature gives CRS an advantage in this simple example to quickly find the optimum.

Diagnostic 2: Examining the optimisation progress
Another interesting feature to examine is the changes in the index value between interpolating bases since the projection on these bases is shown in the tour animation. Trace plots are created by plotting the index value against time. Figure 3 presents the trace plot of the same optimisations as Figure  2 and one can observe that the trace is smooth in both cases. It may seem bizarre at first sight that the interpolation sometimes passes bases with higher index values before it decreases to a lower target. This happens because, on the one hand, the probabilistic acceptance in SA implies that some worse bases will be accepted by the optimiser. In addition, the guided tour interpolates between the current and target basis to provide a smooth transition between projections, and sometimes a higher index value will be observed along the interpolation path. This indicates that a non-monotonic interpolation cannot be avoided, even for CRS. Later, in Section A problem of non-monotonicity, there will be a discussion on improving the non-monotonic interpolation for CRS.

Diagnostic 3a: Understanding the optimiser's coverage of the search space
Apart from checking the search and progression of an optimiser, looking at where the bases are positioned in the basis space is also of interest. Given the orthonormality constraint, the space of projection bases A p×d is a Stiefel manifold. For one-dimensional projections, this forms a pdimensional sphere. A dimensionality reduction method, e.g. principal component analysis is applied to first project all the bases onto a 2D space. In a projection pursuit guided tour optimisation, there are various types of bases involved: 1) The starting basis; 2) The search bases that the optimiser evaluated to produce the anchor bases; 3) The anchor bases that have the highest index value in each iteration; 4) The interpolating bases on the interpolation path; and finally, 5) the end basis. The importance of these bases differs and the most important ones are the starting, interpolating, and end bases. Sometimes, two optimisers can start with the same basis but finish with bases of opposite sign. This happens because the projection is invariant to the orientation of the basis and so is the index value. However, this creates difficulties for comparing the optimisers since the end bases will be symmetric to the origin. A sign flipping step is conducted to flip the signs of all the bases in one routine if different optimisations finish at opposite places.
Several annotations have been made to help understanding this plot. As mentioned previously, the original basis space is a high-dimensional sphere and random bases on the sphere can be generated via the geozoo (Schloerke, 2016) package. We use PCA to project and visualize the parameters/ bases in 2D. The centre of the 2D view is the first two PCs of the data matrix. It theoretically should be a circle, but may have some irregular edges due to finite sampling. Thus the edge is smoothed by using a radius estimated as the largest distance from the centre to any basis. In the simulation, the theoretical best basis is known and can be labelled to compare how close to this that the optimisers stopped. Various aesthetics, i.e. size, alpha (transparency), and colour, are applicable to emphasize critical elements and adjust for the presentation. For example, anchor points and search points are less important and hence a smaller size and alpha are used. Alpha can also be applied on the interpolation paths to show the start to finish from transparent to opaque. Figure 4 shows the PCA plot of CRS and PD for a 1D projection problem. Both optimisers find the optimum but PD gets closer. With the PCA plot, one can visually appreciate the nature of these two optimisers: PD first evaluates points in a small neighbourhood for a promising direction, while CRS evaluates points randomly in the search space to search for the next target. There are dashed lines annotated for CRS and it describes the interruption of the interpolation, which will be discussed in detail in Section A problem of non-monotonicity.

Diagnostic 3b: Animating the diagnostic plots
An animation is another type of display to show how the search progresses from start to finish in the space. Figure 5 shows the animated version (six frames from the animation if viewed in pdf) of the PCA plot in Figure 4. An additional piece of information one can learn from this animation is that CRS finds its end basis quicker than PD since CRS finishes its search in the 5th frame while PD is still making more progress.

Diagnostic 4a: The tour looking at itself
As mentioned previously, the original p × d dimension space can be simulated via randomly generated bases in the geozoo (Schloerke, 2016) package. While the PCA plot projects the bases from the direction that maximises the variance, the tour plot displays the original high-dimensional space from various directions using animation. Figure 6 shows some frames from the tour plot of the same two optimisations in its original space.

Diagnostic 4b: Forming a torus
While the previous few examples have looked at the space of 1D basis in a unit sphere, this section visualises the space of 2D basis. Recall that the columns in a 2D basis are orthogonal to each other, so the space of p × 2 bases is a torus in the p-D space (Buja and Asimov, 1986). For p = 3 one would see a classical 3D torus shape as shown by the grey points in Figure 7. The two circles of the torus can be observed to be perpendicular to each other and this can be linked back to the orthogonality condition. Two paths from CRS and PD are plotted on top of the torus and coloured in green and brown respectively to match the previous plots. The final basis found by PD and CRS are shown in a larger shape and printed below respectively: Both optimisers have found the third variable in the first direction, and the second variable in the second direction. Note however the different orientation of the basis, following from the different Figure 6: Six frames selected from rotating the high-dimensional basis space, along with the same two search paths from Figure 4 and 5. The basis space in this example is a 5D unit sphere, on which points (grey) are randomly generated via the CRAN package geozoo. The full animation can be seen in the html version of the paper. sign in the second column. One would expect to see this in the torus plot as the final bases match each other when projected onto one torus circle (due to the same sign in the first column) and are symmetric when projected onto the other (due the sign difference in the second column). In Figure 7, this can be seen most clearly in frame 5 where the two circles are rotated into a line from our view.

Diagnosing an optimiser
In this section, several examples will be presented to show how the diagnostic plots discover something unexpected in projection pursuit optimisation, and guide the implementation of new features.

Simulation setup
Random variables with different distributions have been simulated and the distributional form of each variable is presented in equations 2 to 8. Variables x1, x8 to x10 are normal noise with zero mean and unit variance and x2 to x7 are normal mixtures with varied weights and locations. All the variables have been scaled to have an overall unit variance before projection pursuit. The holes index (Cook et al., 2008), used for detecting bimodality of the variables, is used throughout the examples unless otherwise specified.

A problem of non-monotonicity
An example of non-monotonic interpolation has been given in Figure 3: a path that passes bases with a higher index value than the target one. For SA, a non-monotonic interpolation is justified since target Figure 7: Six frames selected from rotating the 2D basis space along with two search paths optimised by PD (brown) and CRS (green). The projection problem is a 2D projection with three variables using the holes index. The grey points are randomly generated 2D projection bases in the space and it can be observed that these points form a torus. The full video of the animation can be found in the html version of the paper.
bases do not necessarily have a higher index value than the current one, while this is not the case for CRS. The original trace plot for a 2D projection problem, optimised by CRS, is shown on the left panel of Figure 8 and one can observe that the non-monotonic interpolation has undermined the optimiser to realise its full potential. Hence, an interruption is implemented to stop at the best basis found in the interpolation. The right panel of Figure 8 shows the trace plot after implementing the interruption and while the first two interpolations are identical, the basis at time 61 has a higher index value than the target in the third interpolation. Rather than starting the next iteration from the target basis on time 65, CRS starts the next iteration at time 61 on the right panel and reaches a better final basis.

Close but not close enough
Once the final basis has been found by an optimiser, one may want to push further in the close neighbourhood to see if an even better basis can be found. A polish search takes the final basis of an optimiser as the start of a new guided tour to search for local improvements. The polish algorithm is similar to the CRS but with three distinctions: 1) a hundred rather than one candidate bases are generated each time in the inner loop; 2) the neighbourhood size is reduced in the inner loop, rather than in the outer loop; and 3) three more termination conditions have been added to ensure the new basis generated is distinguishable from the current one in terms of the distance in the space, the relative change in the index value, and neighbourhood size: 1) the distance between the basis found and the current needs to be larger than 1e-3; 2) the relative change of the index value needs to be larger than 1e-5; and 3) the alpha parameter needs to be larger than 0.01 Figure 9 presents the projected data and trace plot of a 2D projection, optimised by CRS and followed by the polish step. The top row shows the initial projection, the final projection after CRS, and the final projection after polish, respectively The end basis found by CRS reveals the four clusters in the data, but the edges of each cluster are not clean-cut. Polish works with this end basis and further pushes the index value to produce clearer edges of the cluster, especially along the vertical axis.

Seeing the signal in the noise
The holes index function used for all the examples before this section produces a smooth interpolation, while this is not the case for all the indexes. An example of a noisy index function for 1D projections compares the projected data, Y n×1 , to a randomly generated normal distribution, N n×1 , using the Kolmogorov test. Let F . (n) be the ECDF function (empirical cummulative distribution function), with two possible subscripts Y and N representing the projected and randomly generated data, and n denoting the number of observation, the Kolmogorov index I nk (n), is defined as:  Figure 9: Comparison of the projected data before and after using polishing for a 2D projection problem on boa6 data using holes index. The top row shows the initial projected data and the final views after CRS and polish search and the second row traces the index value. The clustering structure in the data is detected by CRS (top middle panel) but the polish step improves the index value and produces clearer boundaries of the clusters (top right panel), especially along the vertical axis. Note that the parameter max.tries is set to 400 in this experiment for CRS to do its best.  Figure 10: Comparison of the three optimisers in optimising I nk (n) index for a 1D projection problem on a five-variable dataset, boa5. Both CRS and SA succeed in the optimisation, PD fails to optimise this non-smooth index. Further, SA takes much longer than CRS to finish the optimisation, but finishes off closer to the theoretical best.
With a non-smooth index function, two research questions are raised: 1) whether any optimiser fails to optimise this non-smooth index; and 2) whether the optimisers can find the global optimum despite the presence of a local optimum Figure 10 presents the trace and PCA plots of all three optimisers and as expected, none of the interpolated paths are smooth. There is barely any improvement made by PD, indicating its failure in optimising non-smooth index functions. While CRS and SA have managed to get close to the index value of the theoretical best, the trace plot shows that it takes SA much longer to find the final basis. This long interpolation path is partially due to the fluctuation in the early iterations, where SA tends to generously accept inferior bases before concentrating near the optimum. The PCA plot shows the interpolation path and search points excluding the last termination iteration. Pseudo-Derivative (PD) quickly gets stuck near the starting position. Comparing the amount of random search done by CRS and SA, it is surprising that SA does not carry as many samples as CRS. Combining the insights from both the trace and PCA plot, one can learn the two different search strategies by CRS and SA: CRS frequently samples in the space and only make a move when an improvement is guaranteed to be made, while SA first broadly accepts bases in the space and then starts the extensive sampling in a narrowed subspace. The specific decision of which optimiser to use will depend on the index curve in the basis space but if the basis space is non-smooth, accepting inferior bases at first, as SA has done here, can lead to a more efficient search, in terms of the overall number of points evaluated.
The next experiment compares the performance of CRS and SA when a local maximum exists. Two search neighbourhood sizes: 0.5 and 0.7 are compared to understand how a large search neighbourhood would affect the final basis and the length of the search. Figure 11 shows 80 paths simulated using 20 seeds in the PCA plot, faceted by the optimiser and search size. With CRS and a search size of 0.5, despite being the simplest and fastest, the optimiser fails in three instances where the final basis lands neither near the local nor the global optimum. With a larger search size of 0.7, more seeds have found the global maximum. Comparing CRS and SA for a search size of 0.5, SA does not seem to improve the final basis found, despite having longer interpolation paths. However, the denser paths near the local maximum is an indicator that SA is working hard to examine if there is any other optimum in the basis space but the relatively small search size has diminished its ability to reach the global maximum. With a larger search size, almost all the seeds (16 out of 20) have found the global maximum and some final bases are much closer to the theoretical best, as compared to the three other cases. This indicates that SA, with a reasonable large search window, is able to overcome the local optimum and optimise close towards the global optimum.

Reconciling the orientation
One interesting situation observed in the previous examples is that, for some simulations, as shown on the left panel of Figure 12, the target basis is generated on the other half of the basis space and the interpolator seems to draw a straight line to interpolate. Bases with opposite signs do not affect the projection and index value, but we would prefer the target to have the same orientation as the current basis. The orientation of two bases can be computationally checked by calculating the determinant -a negative value suggests the two bases having different orientation. For 1D bases, this can be corrected by flipping the sign in one basis. For higher dimensions, it can be a bit more difficult because the orthonormality of the basis needs to be maintained also when an individual vector is flipped. Here an orientation check is carried out once a new target basis is generated and the sign in the target basis will be flipped if a negative determinant is obtained. The interpolation after implementing the orientation check is shown on the right panel of Figure 12 where the unsatisfactory interpolation no longer exists.

Implementation
This project contributes to the software development in two packages: a data collection object is implemented in tourr (Wickham et al., 2011), while the visual diagnostics of the optimisers is implemented in ferrn (Zhang et al., 2021). The functions in the ferrn (Zhang et al., 2021) package are listed below: original flipped original flipped Figure 12: Comparison of the interpolation in the PCA-projected basis space before and after reconciling the orientation of the target basis. Optimisation is on the 1D projection index, I nk (n), for boa6 data using CRS with seed 2463. The dots represent the target basis in each iteration and the path shows the interpolation. On the left panel, one target basis is generated with an opposite orientation to the current basis (hence appear on the other side of the basis space) and the interpolator crosses the origin to perform the interpolation. The right panel shows the same interpolation after implementing an orientation check and the undesirable interpolation disappears.
-explore_trace_interp() produces trace plots for the interpolation points in Figure 3.
-explore_space_pca() produces the PCA plot of projection bases on the reduced space. Figure 4 includes the additional details of anchor and search bases, which can be turned on by the argument details = TRUE. The animated version in Figure 5 is produced with argument animate = TRUE. -explore_space_tour() produces animated tour view on the full space of the projection bases in Figure 6.
-add_dir_search() for plotting the directional search bases with magnified distance -add_end() for plotting end bases -add_interp() for plotting the interpolation path -add_interp_last() for plotting the last interpolation bases for comparing with target bases when interruption is used -add_interrupt() for linking the last interpolation bases with target ones when interruption is used -add_search() for plotting search bases -add_space() for plotting the circular space -add_start() for plotting start bases -add_theo() for plotting theoretical best bases, if applicable • Utilities -theme_fern() and format_label() for better display of the grid lines and axis formatting -clean_method() to clean up the name of the optimisers -botanical_palettes() is a collection of colour palettes from Australian native plants.
Quantitative palettes include daisy, banksia, and cherry and sequential palettes contain fern and acacia -botanical_pal() as the colour interpolator -scale_color_*() and scale_fill_*() for scaling the colour and fill of the plot

Conclusion
This paper has provided several visual diagnostics that can be used for understanding a complex optimisation procedure and are implemented in the ferrn package. The methods were illustrated using the optimisers available for projection pursuit guided tour. Here the constraint is the orthonormality condition of the projection bases, which corresponds to optimisation over spheres and torii. The approach described broadly applies to other constrained optimisers. Although the manifold in p-space might be different the diagnostic techniques are the same. A researcher would begin by saving the path of the optimiser in a form required to input into the ferrn package, as described in this paper. One might generally make more samples from the constrained space upon which to assess and compare the optimisation paths. These high-dimensional data objects can then be viewed using the tour.
The progressive optimisation of a target function and it's coverage of the search space can be viewed in both reduced 2D space and the full space. These visualisations can lead to insights for evaluating and comparing the performance of multiple optimisers operating on the same task. They can provide better understanding of existing methods or motivate the development of new approaches. For example, we have compared how three optimisers perform when maximising a non-smooth index function and illustrated how the pseudo-derivative search fails in this setting. The observations from our experiments have also been translated into improved optimisation methods for the guided tour, e.g. we introduced the option to interrupt the search if a better basis is found along the path. This work might be considered an effort to bring transparency into algorithms. Although little attention is paid by algorithm developers to providing ways to output information during intermediate steps, this is an important component for enabling others to understand and diagnose the performance. Algorithms are an essential component of artificial intelligence that is used to make daily life easier. Interpretability of algorithms is important to guard against aspects like bias and inappropriate use. The data object underlying the visual diagnostics here is an example of what might be useful in algorithm development generally.