Massive Parallel Self-organizing Map and 2-Opt on GPU to Large Scale TSP

Qiao, Wen-bao; Créput, Jean-charles

doi:10.1007/978-3-319-59153-7_41

Wen-bao Qiao¹⁶ &
Jean-charles Créput¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10305))

Included in the following conference series:

International Work-Conference on Artificial Neural Networks

1901 Accesses
2 Citations

Abstract

This paper proposes a platform both for parallelism of self-organizing map (SOM) and the 2-opt algorithm to large scale 2-Dimensional Euclidean traveling salesman problems. This platform makes these two algorithms working in a massively parallel way on graphical processing unit (GPU). Advantages of this platform include its flexibly topology preserving network, its fine parallel granularity and it allows maximum (N / 3) 2-opt optimization moves to be executed with O(N) complexity within one tour orientation and does not cut the integral tour. The parallel technique follows data decomposition and decentralized control. We test this optimization method on large TSPLIB instances, experiments show that the acceleration factor we obtained makes the proposed method competitive, and allows for further increasing for very large TSP instances along with the quantity increase of physical cores in GPU systems.

Download conference paper PDF

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Keywords

1 Introduction

The Traveling Salesman Problem (TSP) is well-known NP-hard, which indicates a permutation tour that allows a salesman to travel each city once and return to his starting city. Kohonen Self-organizing Map [1,2,3,4,5] and 2-opt local search have been proved before that they can work together to get an intermediate Euclidean TSP solution considering trade-off between quality and execution time [6]. Also, for further acceleration, parallel strategies both for SOM [7,8,9,10,11,12,13] and 2-opt [14,15,16,17] have been separately studied for different applications base on various multi-threads devices (SP2, GPU) in past two decades. However, it still exists some problems in these previous works in terms of types of topology, level of parallelism and memory occupancy.

Considering the general SOM working on predefined topological map, its computation time can be divided into following three parts: a time required to determine the closest winner node to the present input; a time required to determine winner node’s neighborhood and a time required for updating weights (map updating) [8]. Here, the first operation can be accelerated by using massively parallel local spiral search to simultaneously find the closest winner node for each input data [13, 18]. The second and third operations are influenced by network architecture of the topological maps that provide access to neighboring nodes during training process of SOM. Topological maps are usually constructed with unidirectional link network or buffer grid like those SOM applications in [6, 11, 13, 18]. However, training processes on the unidirectional network can only go in one direction. Though algorithms using 2D grid can access arbitrary neighboring node within needed radius centering one winner node, the topological map is limited to regular topology structures, like these work in [6, 21]. Besides, method to access neighboring nodes on different topological map plays an important role for applications that require preservation of initial topological relationship between nodes during training process of SOM.

Previous parallel 2-opt implementations can not be defined as massively parallel method for the reason of tour ordering requirement. Like tailored data parallelism that assigns one thread to treat one partial tour’ optimization [14], function parallelism that one edge’ optimization is executed in parallel [16, 19], or geometrically parallelism that assigns one thread to search optimization in divided sub-areas [20].

To solve these problems, in this paper, we propose a platform to implement training process of SOM on irregular topological maps while preserving topological relationship between nodes and a modified 2-opt method to make 2-opt happen in a massive parallel way. We test this platform on GPU taking advantage of its multi-threads’ operation on global memory.

The following paper is organized as this: Sect. 2 presents related work about parallel SOM and 2-opt; Sect. 3 presents the proposed platform including parallel SOM working on topological maps and modified 2-opt framework with one concrete implementation. And the last section includes the experiments and discussion.

2 Related Work

For parallel implementations of SOM, two common parallel techniques of network partitioning and data partitioning have been used both on CPU- and GPU-based system, which has been discussed and compared in various literatures [7,8,9,10,11,12,13]. The main difference between these two techniques is whether the step of node update (map updating) happens independently in different part of the network or simultaneously for all the input data after the training step. Network partitioning technique breaks up the map into parts and each part has a thread to update neurons separately. While using data decomposition technique, map updating does not occur until all the input data has found its winner node and trained its neighbor nodes [12]. For TSP applications, map updating step indicates the relocation of each neuron on Euclidean space.

For parallel SOM specified to TSP applications, Wang et al. [21] proposes parallel SOM implementations based on GPU cellular matrix model proposed by Zhang et al. [18], in which he use buffer grid as neural network structure and assigns one thread to treat all input data in one cell sequentially. His training step of SOM works on the integral buffer grid on global memory [21]. Different from his work, the network in this paper is memorized by doubly linked network both for SOM and 2-opt instead of intermediately by cellular matrix, the cellular matrix model is only used for parallel local spiral search operator, and we assign one thread for one city instead of for one cell.

For parallel 2-opt algorithms, Johnson et al. [15, 19] discussed parallel schemes like “geometric partitioning and tour-based partitioning”. One geometric partitioning scheme proposed by Karp [20] is based on a recursive subdivision of the overall region containing the cities into rectangles [19]. Verhoeven et al. [14] distinguished parallel 2-opt algorithms between data and function parallelism [14] in which he proposed a tour repartitioning scheme that guarantees their algorithm will not halt until it has found a minimum for the complete problem [14]. Van Luong et al. [17] and Rocki and Sudha [16] adopt parallel strategies similar to “function parallelism” which means one sequential 2-opt is executed in parallel, as Rocki and Sudha [16] distributes the calculation for one edge’s exhaustive 2-opt optimization between threads, but only the first edge’s optimization has finished, the second edge begins its parallel 2-opt optimization.

3 Proposed Methods

Outline of the proposed parallel platform for SOM and 2-opt to large scale TSP applications is shown in Algorithm 1. It mainly includes three parts, first one for initialization step, second one for parallel SOM and last one for parallel 2-opt. Both the two algorithms work base on topological maps constructed by using doubly linked network shown in Figs. 2(b) and 3. “Doubly linked” means that if node A connects (buffers) node B, node B should necessarily connect (buffer) node A, and every node only buffers its directly connected nodes.

3.1 Initialization

The initialization step mainly includes the preparation of SOM maps with initialized neurons, cellular matrix for local spiral search and necessary data transmission from CPU to GPU side.

The initial SOM maps can be predefined with properties that these topologically close nodes correspond to close pixels in images as shown in Fig. 3 or without properties, for example, just add two neighboring nodes to each neuron to be an input TSP solution, as shown in Fig. 2(b). For a TSP instance with N cities (“trainer”), the artificial doubly linked neural network (“learner”) is initialized with $2\times {N}$ neurons [5]. Coordinate of these initial neurons are initialized according to input cities by setting a small random difference.

One basic building block in this paper is local spiral search operator [22] on 2-dimension Euclidean space. This operator works on cellular matrix [18, 23] to find the closest neighboring nodes for the proposed methods. The Euclidean area that contains all the input N points is partitioned into cellular matrix where each cell has its coordinate and buffers nodes lying on corresponding partitioned area. For spiral search operator used by SOM, the algorithm should prepare two cellular matrix for input data (“trainer”) and neurons (“learner”) separately.

After those steps, the input data, topological maps and the two cellular matrix are copied to GPU side for parallel computing.

3.2 Parallel Self-organizing Map

Our parallel implementation of SOM contains following three steps in one epoch (iteration, epo), as shown in Fig. 4.

1.
Search winner node : spiral search each input point’s closest neuron;
2.
Train neighborhood : iteratively access the directly connected neighbor nodes according to topological distance and apply Kohonen’s learning low;
3.
Refresh cellular matrix : refresh position of each neuron on cellular matrix.

The search winner node step mainly indicates massively parallel local spiral search operation working on cellular matrix with one thread for one city. For each input city, the algorithm gets cell coordinate of the cell where this city lies on the “trainer’s” cellular matrix, and tries to find this city’s closest neuron on “learner’s” cellular matrix beginning with the same cell coordinate. If this cell on “learner’s cellular does not have neurons, the algorithm searches its neighborhood cells one by one in a spiral manner centering the beginning cell. Every searching operation stops at radius $(lssr + 1)$ or maximum searching range in cellular matrix (LSSR), lssr is the searching radius where the operator encounters the first closest neuron. It has been proved that a single spiral search operation on a bounded data distribution or a uniformed data distribution only takes average O(1) complexity for finding the closest point [22].

The train neighborhood step in one epoch follows Kohonen leaning low that is present in Eq. 1. Considering current epo’ iteration, a winner node $p^*$ has been found at previous step for one input city p, the Kononen learning low is applied to $p*$ and to neurons within a finite neighborhood of $p^*$ of topological radius r. Topological distance from the winner node $d_G$ and learning rate $\alpha (epo)$ affect the learning force of each neuron, p indicates the Euclidean position of input city, $w_k(epo)$ represents Euclidean position of neurons and k indicates different neighborhood neurons. The proposed method applies this learning low on doubly linked network instead of 2D grid, as shown in Figs. 3 and 5(b). After training step of one iteration, the algorithm updates SOM parameters for next iteration by decreasing learning rate $\alpha (epo)$ and training radius r.

$$\begin{aligned} w_{k}(epo+1)= w_{k}(epo) + \alpha (epo)\times exp(-d_{G}(p^*, k)^2 / r^{2} )\times (p-w_{k}(epo)) \end{aligned}$$

(1)

We should emphasize that the train neighborhood operation accesses all needed neighboring neurons in an iterative manner of one “circle” after another according to their topological distance from winner node. As shown in Fig. 5(b), only all green neurons have been trained, the training procedure begins to train these blue neurons. This iterative operation ensures that neurons with larger topological distance $d_G$ would not move closer to the winner node than neurons with less $d_G$ during one training process, which preserves the predefined topological relationship between nodes.

Apply SOM to TSP. When applying SOM to specifically solve TSP applications, every initial city should have a chance to be a “trainer” to train the same network. And it is easy to satisfy this requirement by assigning one thread for one city. Once SOM has stopped, projecting one city to a non-occupied closest neuron to generate an initial TSP solution for further optimization. One TSP solution of SOM is shown in Fig. 6.

3.3 Massively Parallel 2-opt with Data Decomposition

To optimize the TSP solution produced by SOM, various 2-opt strategies can be used, while the nature attributes of 2-opt make its massively parallel implementation become more complex because of tour ordering requirement. Reasons are following: first, one 2-opt move needs to consider tour ordering to avoid cutting the tour, as shown in Fig. 1; second, massive correct 2-opt moves simultaneously found in one same tour orientation may also cut the tour as shown in Fig. 7(b,c). However, these massively non-interacted 2-exchanges shown in Fig. 7(a) can be executed in parallel without cutting the tour.

Here, trying to get massively parallel 2-opt optimization while respecting those nature attributes of 2-opt, one simple choice is to adopt a strategy of massively parallel 2-opt evaluation with sequential execution, simplified as “parallel evaluation but sequential execution”. Its principle idea is straightforward: the algorithm begins with simultaneously searching each edge’s 2-opt optimization in one same tour orientation according to a certain neighborhood edge searching rule; but only those non-interacted 2-exchanges shown in Fig. 7(a) can be sequentially detected and executed on CPU side in one iteration.

Overall outline of this massively parallel 2-opt framework is shown in Fig. 8. The algorithm begins with a TSP solution represented by doubly linked list where each city $P_m (m = 0,1,2 ... N-1)$ has its two and only two links that are used to compose a TSP tour ring. In one iteration, the first step needs to follow current tour solution and assign each city a unique increasing tour order. We use this tour order to detect non-interactive 2-opt moves and indicate current tour orientation in later operations. After this step of “refresh tour order”, the algorithm copies necessary data from CPU to GPU side and launches kernel functions to check each edge’s 2-opt optimization according to a certain neighboring edge searching rule and store each edge’s 2-opt information to this edge’s starting city according to current tour orientation. At last, the algorithm copies each edge’s optimizing information from GPU to CPU side to sequentially detect and execute these massive non-interacted 2-opt exchanges.

Under this straightforward parallel scheme, various concrete implementations can be applied. For example, the algorithm can simultaneously search 2-opt optimization for each edge along the integral tour or in a local search range. Here, we apply a concrete edge searching rule through searching each edge’s 2-opt optimization among its neighborhood edges by using local spiral search operator, namely 2-opt LSS.

4 Experiments and Analysis

We implement this combinatorial method on GPU taking advantage of GPU’s parallel read/write operation on global memory with atomic control [24]. For neighborhood 2-opt optimization using local spiral search (2-opt LSS), its optimization capability compared with traditionally sequentially exhaustive 2-opt along integral tour (2-opt EAT) is interesting. We test these two strategies base on same TSP solution of SOM. Both the two optimization methods adopt an evaluation strategy of “first optimized first accept”. As both these two methods can not further optimize the TSP tour after fixed number of iterations, we set a stop criterion to judge whether the tour has been optimized or not in current iteration.

Parameter setting in Eq. 1 influences result quality and running time of SOM. we apply same parameters shown in Table 1 for different TSP instances. $\alpha _{ini}$ and $\alpha _{final}$ are learning rate at the starting and final epoch; $ r_{ini}$ and $r_{final}$ are neighborhood topological radius from the winner node at the starting and final epoch; epo sets the number of iteration; LSSR sets the searching range on cellular matrix for the spiral search operator.

Table 1. Parameter setting for SOM.

Full size table

Average results of ten tests for each TSP instance are shown in Tables 2 and 3. In each table, 2-opt works based on same results of SOM. For each method listed in these two tables, “t(s)” is the average time taken in one test, including necessary time for generating random data, refreshing TSP tour order and copying data from GPU to CPU; “%PDM” is the percentage deviation between the mean solution and the optimum solution; “%PDB” is the percentage deviation between the best solution and the optimum solution; “epo” indicates the average quantity of iterations in one test.

Table 2. Test results of sequential implementations.

Full size table

Table 3. Test results of parallel implementations.

Full size table

Comparing running time of the two tables, acceleration of the proposed parallel implementations is obvious. And we think the acceleration factor would be more obvious for larger TSP instances, as the proposed parallel platform both for SOM and 2-opt takes O(N) complexity either for the memory size or for GPU threads.

Comparing solution quality, we should mention that the proposed model deals with a procedure similar to original standard SOM and we did not try to find the best parameters to get best performance of SOM. However, maximum number of iterations, running time and the result quality of the two 2-opt methods are influenced by initial TSP results of SOM.

Visual results of one test in Table 3 are shown in Fig. 9. Our experiments work on the laptop with CPU Intel(R) Core(TM) i7-4710HQ, 2.5 GHz and GPU card GeForce GTX 850M.

5 Conclusion

In this paper, we propose an alternative parallel computing platform both for SOM and 2-opt. We test the proposed methods with GPU parallel computation and present the obtained acceleration factor. We believe the acceleration factor will increase for very large scale instances as capacity of parallel devices is growing. Further jobs would concentrate on tests for very large size TSP instances and comparison with more different combinatorial optimization methods.

References

Kohonen, T.: The self-organizing map. Proc. IEEE 78(9), 1464–1480 (1990)
Article Google Scholar
Angeniol, B., Vaubois, G.D.L.C., Le Texier, J.-Y.: Self-organizing feature maps and the travelling salesman problem. Neural Netw. 1(4), 289–293 (1988)
Article Google Scholar
Modares, A., Somhom, S., Enkawa, T.: A self-organizing neural network approach for multiple traveling salesman and vehicle routing problems. Int. Trans. Oper. Res. 6(6), 591–606 (1999)
Article MathSciNet MATH Google Scholar
Bai, Y., Zhang, W., Jin, Z.: An new self-organizing maps strategy for solving the traveling salesman problem. Chaos, Solitons Fractals 28(4), 1082–1089 (2006)
Article MathSciNet MATH Google Scholar
Créput, J.-C., Koukam, A.: A memetic neural network for the Euclidean traveling salesman problem. Neurocomputing 72(4), 1250–1264 (2009)
Article Google Scholar
Brocki, Ł., Koržinek, D.: Kohonen self-organizing map for the traveling salesperson problem. In: Jabłoński, R., Turkowski, M., Szewczyk, R. (eds.) Recent Advances in Mechatronics, pp. 116–119. Springer, Heidelberg (2007)
Chapter Google Scholar
Mann, R. Haykin, S.: A parallel implementation of Kohonen’s feature maps on the warp systolic computer (1990)
Google Scholar
Wu, C.-H., Hodges, R.E., Wang, C.-J.: Parallelizing the self-organizing feature map on multiprocessor systems. Parallel Comput. 17(6), 821–832 (1991)
Article MATH Google Scholar
Ienne, P., Thiran, P., Vassilas, N.: Modified self-organizing feature map algorithms for efficient digital hardware implementation. IEEE Trans. Neural Netw. 8(2), 315–330 (1997)
Article Google Scholar
Lawrence, R.D., Almasi, G.S., Rushmeier, H.E.: A scalable parallel algorithm for self-organizing maps with applications to sparse data mining problems. Data Min. Knowl. Disc. 3(2), 171–195 (1999)
Article Google Scholar
Valova, I., Szer, D., Gueorguieva, N., Buer, A.: A parallel growing architecture for self-organizing maps with unsupervised learning. Neurocomputing 68, 177–195 (2005)
Article Google Scholar
Richardson, T., Winer, E.: Extending parallelization of the self-organizing map by combining data and network partitioned methods. Adv. Eng. Softw. 88, 1–7 (2015)
Article Google Scholar
Qiao, W., Créput, J.-C.: Stereo matching by using self-distributed segmentation and massively parallel GPU computing. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2016. LNCS, vol. 9693, pp. 723–733. Springer, Cham (2016). doi:10.1007/978-3-319-39384-1_64
Google Scholar
Verhoeven, M., Aarts, E.H., Swinkels, P.: A parallel 2-opt algorithm for the traveling salesman problem. Future Gener. Comput. Syst. 11(2), 175–182 (1995)
Article Google Scholar
Johnson, D.S., McGeoch, L.A.: Experimental analysis of heuristics for the STSP. In: Gutin, G., Punnen, A.P. (eds.) The Traveling Salesman Problem and its Variations, pp. 369–443. Springer, Heidelberg (2007)
Chapter Google Scholar
Rocki, K., Suda, R.: Accelerating 2-opt and 3-opt local search using GPU in the travelling salesman problem. In: 2012 International Conference on High Performance Computing and Simulation (HPCS), pp. 489–495. IEEE (2012)
Google Scholar
Van Luong, T., Melab, N., Talbi, E.-G.: GPU computing for parallel local search metaheuristic algorithms. IEEE Trans. Comput. 62(1), 173–185 (2013)
Article MathSciNet Google Scholar
Zhang, N., Wang, H., Creput, J.-C., Moreau, J., Ruichek, Y.: Cellular GPU model for structured mesh generation and its application to the stereo-matching disparity map. In: 2013 IEEE International Symposium on Multimedia (ISM), pp. 53–60. IEEE (2013)
Google Scholar
Johnson, D.S., McGeoch, L.A.: The traveling salesman problem: a case study in local optimization. Local Search Comb. Optim. 1, 215–310 (1997)
MATH Google Scholar
Karp, R.M.: Probabilistic analysis of partitioning algorithms for the traveling-salesman problem in the plane. Math. Oper. Res. 2(3), 209–224 (1977)
Article MathSciNet MATH Google Scholar
Wang, H., Zhang, N., Créput, J.-C.: A massive parallel cellular GPU implementation of neural network to large scale Euclidean TSP. In: Castro, F., Gelbukh, A., González, M. (eds.) MICAI 2013. LNCS, vol. 8266, pp. 118–129. Springer, Heidelberg (2013). doi:10.1007/978-3-642-45111-9_10
Chapter Google Scholar
Bentley, J.L., Weide, B.W., Yao, A.C.: Optimal expected-time algorithms for closest point problems. ACM Trans. Math. Softw. (TOMS) 6(4), 563–580 (1980)
Article MathSciNet MATH Google Scholar
Rajasekaran, S.: On the Euclidean minimum spanning tree problem. Comput. Lett. 1(1) (2004)
Google Scholar
CUDA C Programming Guide: CUDA toolkit documentation
Google Scholar

Download references

Acknowledgments

This paper is together sponsored by China Scholarship Council (CSC) and LE2I UBFC.

Author information

Authors and Affiliations

Le2i FRE2005, CNRS, Arts et Métiers, Univ. Bourgogne Franche-Comté, Besançon, France
Wen-bao Qiao & Jean-charles Créput

Authors

Wen-bao Qiao
View author publications
You can also search for this author in PubMed Google Scholar
Jean-charles Créput
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wen-bao Qiao .

Editor information

Editors and Affiliations

Universidad de Granada , Granada, Spain
Ignacio Rojas
University of Malaga , Malaga, Spain
Gonzalo Joya
Polytechnic University of Catalonia, Vilanova i la Geltrú, Barcelona, Spain
Andreu Catala

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Qiao, Wb., Créput, Jc. (2017). Massive Parallel Self-organizing Map and 2-Opt on GPU to Large Scale TSP. In: Rojas, I., Joya, G., Catala, A. (eds) Advances in Computational Intelligence. IWANN 2017. Lecture Notes in Computer Science(), vol 10305. Springer, Cham. https://doi.org/10.1007/978-3-319-59153-7_41

Download citation

DOI: https://doi.org/10.1007/978-3-319-59153-7_41
Published: 18 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59152-0
Online ISBN: 978-3-319-59153-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Massive Parallel Self-organizing Map and 2-Opt on GPU to Large Scale TSP

Abstract

Keywords

1 Introduction

2 Related Work

3 Proposed Methods

3.1 Initialization

3.2 Parallel Self-organizing Map

3.3 Massively Parallel 2-opt with Data Decomposition

4 Experiments and Analysis

5 Conclusion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation