SA Sorting: A Novel Sorting Technique for Large-Scale Data

,


Introduction
Sorting big numbers in an optimized way is a challenging task to perform. A number of such sorting algorithms exist which are optimized, but their execution time is still to be optimized. Sometimes, these algorithms take the same amount of time to sort a sorted record as they take to sort an unsorted record. e sorting algorithm should be stable, effective, less complex, and efficient. SA sorting follows most of these parameters. SA sorting is introduced as a new approach to operate on both sorted and unsorted lists or records and shows better execution time on the sorted list. e following section discusses the existing sorting approaches.

Bubble Sort.
It is a stable sorting algorithm in which each element in the list is compared with its next adjacent element, and the process is repeated until the elements are sorted. If we have n elements, then there are (n − 1) number of passes and n(n − 1)/2 number of iterations in totality. Mathematically, and thus, e algorithm for bubble sort is given in Algorithm 1. In this algorithm, if no swap occurs, it will break the loop and directly go to end, and thus, only one loop executes that determines best-case analysis which becomes O(n). On the contrary, the complexity of average and worst cases would be O(n 2 ). Bubble sort is highly code inefficient, and it is one of the worst sorting approaches, so professionals do not use it.

Insertion Sort.
It is the stable sorting algorithm in which starting from the first number from the record, this first number is compared to every adjacent number to find its sorted position. When the position is found, the number is inserted there. e algorithm for insertion sort is given in Algorithm 2.
Insertion sort holds good for smaller datasets since it is also code inefficient for large data/lists or big numbers. Insertion sort is 2 times faster than bubble sort. For the best case, it is O(n), while for the average or worst case, it is O(n 2 ).

Selection Sort.
By nature, selection sort is unstable, but it can be improved to become stable. In this sorting technique, we are supposed to find the smallest number from the record, put this number at the starting position, change the position to next, and then again find the smallest number from the remaining list. is process goes on and on, until the whole list becomes sorted. is algorithm is efficient for smaller records, but for larger records, this technique is again code inefficient. e algorithm is given in Algorithm 3.
Its execution time is better for smaller data/records (up to hundreds of records). e best-, average-, or worst-case complexity is the same as O(n 2 ).

Merge Sort.
It is a stable sorting algorithm and is very efficient to handle big numbers. It is based on the following three steps [1]: (1) Divide the given list or record into number of sublists in such a way that every list or sublist is half divided (2) Conquer the sublist by the recursion method (3) Combine the sorted sublist simply by merging e sorting is actually done in the second step. Merge sort is the only sorting technique which is purely based on the divide-and-conquer technique of solving the algorithm. It requires double the memory as required by other sorting techniques.
e algorithm for merge sort is given in Algorithms 4 and 5.
From the algorithm, it is analyzed that merge sort has the following recurrence relation: Using the master method, F(n) � cn, a � 2, and b � 2, and thus, Now, For all three cases, it would be O(n log n).

Quick Sort.
It is unstable but efficient and is one of the fast working sorting algorithms. It is based on the divideand-conquer strategy. While talking about quick sort at an instant, in our mind, there comes the concept of pivot element. e pivot element is one of the randomly chosen members from the list which is under the operation of (1) for X � 1 to n do  sorting. It works well for both smaller and larger lists or records, but in case if the list is sorted, some results are sometimes obtained that we cannot even imagine. It is built on the recursion phenomenon. e algorithm is given in Algorithms 6 and 7. e partition function works as follows: If n � m, then it is the worst-case scenario and complexity � O(m 2 ). But for the average-or best-case scenario, n � log 2 m and then complexity would be O(m log 2 m).

Tree Sort.
It is an unstable sorting algorithm built on the binary search tree (BST). e element in tree sort is sorted using in-order traversing operation. Tree sort requires extra memory space, and complexity changes from balanced BST to unbalanced BST. e complexity of the tree sort is O(n 2 ) for the worst case and O(n log n) for average and best cases.

Gnome Sort.
It is a stable sorting approach. When we think that the list or record is sorted but we are not sure, we need an algorithm which works best on the sorted list; for this purpose, we use gnome sort. It performs well not only on the sorted list but also on the unsorted list. e algorithm is given in Algorithm 8.
From the algorithms, it is clearly seen that if the list is sorted, no interchange of elements is done; hence, it executes linearly. us, for the best case, it is O(n), and for average and worst cases, it is O(n 2 ).

Counting Sort.
It is a stable and easily understandable sorting algorithm. As the name depicts, counting sort works by finding the largest element in the given list/record, and then starting from the least element/number, its frequency is counted and at last the sorted list is produced maintaining the order of its occurrence while sorting. It is useful in those cases where the difference between the numbers is very small and the dataset is also very small. e step-by-step procedure of counting sort is discussed in Algorithm 9.

Grouping Comparison Sort (GCS).
Suleiman with his team of three other members proposed the GCS algorithm. e methodology they have used is to divide the given list/ record into groups. Each of these groups contains three elements, and comparison is done in such a way that every element of one group is compared to that in other groups. e main drawback of this algorithm is that the input size must be less than or equal to 25000 records to get the better results. e complexity becomes O(n 2 ) for all the three cases.

Heap Sort.
Heap sort is a stable but efficient sorting algorithm which is based on the complete binary tree and follows the heap order. Heap sort contains min heap, in which the root node is having the minimum value, and max heap, in which the root node is having the maximum value.
Journal of Computer Networks and Communications 3 e procedure of heap sort is explained through Algorithms 10 and 11.
For heap sort, in all the three cases (i.e., best, average, and worst cases), the complexity is the same as O(n log n), where n is the number of records in the list.
2.11. Radix Sort. It is a stable and efficient sorting algorithm when the size of list/record is small. Internally, it acts like counting sort. One of the drawbacks of radix sort is it operates on one number many times as it has the number of significant bits on a digit. Suppose if there is a number 169, the radix sort operates on it three times, sorting from the least significant digit 9 to the most significant digit 1 in the list. Radix sort compares the LSB of all the numbers in a similar way to proceed with further results in a sorting list. e procedure of radix sort is explained in Algorithm 12.
If the size is the longest length m, then there are a elements and the complexity would be O(m * n). If the size varies to a constant number, then the size is ignored and the complexity becomes O(n) for all the three cases.

Cocktail Sort.
It is a stable and efficient sorting algorithm as compared to bubble sort as it is the extended version of bubble sort. Cocktail sort works on both sides of the list. During sorting, it puts the largest element to the tail side and smallest element to the head side. e head side and tail side are shown in Figure 1.
Bubble sort puts the biggest element to the tail side after every pass, while cocktail sort puts the smallest element to the head side and the biggest element to the tail side after every pass. e complexity of cocktail sort for worst and average cases is O(n 2 ), but for the best case, it is O(n).

Comb Sort.
It is the stable and another improved version of bubble sort as it changes the gap size from 1 to 1.3 for every iteration. Gap size tells the algorithm to swap. As the gap size increases, the number of swaps decreases; thus, on the average-case scenario, comb sort performs better. But ] − 1 (9) end for (10) end for ALGORITHM 9: Counting sort(P, Q, T).  (1) Change ⟵ 1 (2) for k � 1 to size do (3) for e � 1 to a do (4) z � (X[e].size/change)mod10 Change ⟵ Change * 10 (8) end for (9) end for ALGORITHM 12: Radix sort(X, a). 4 Journal of Computer Networks and Communications for the worst case, it remains the same as O(n 2 ). For the best case, it is O(n) as no swap is done.
2.14. Enhanced Selection Sort. It is the extended version of selection sort which is made stable by decreasing the regular size of the list. is is done in such a way that first the biggest element is found, a swap is made, the size of the list is decreased by 1, and then the sort is again performed in a similar way to get the list sorted. Although the complexity would be the same as that of selection sort but in the best-case scenario, the number of swaps would be zero for enhanced selection sort. e complexity for all the three cases would be O(n 2 ).

Shell Sort.
It is the unstable and efficient sorting algorithm which is the extended version of insertion sort. Shell sort works well if the given list is partially sorted. is means we are discussing about the average-case scenario. Shell sort uses Knuth's formula to calculate the interval or spacing. is formula is given as follows: where x has the starting value 1 and is called the interval/ spacing. Shell sort divides the given list into sublists by using an increment which is called the gap, compares the elements, and then interchanges them depending upon the order either increasing or decreasing. e algorithm for shell sort is given in Algorithm 13. For best and worst cases, the complexity is O(n log n) and O(n 2 ), respectively. For the average case, it depends upon the gap.

Bucket Sort. It is a stable sort and consists of bucket.
e elements are inserted in the bucket, and then, the sorting is employed on the bucket. Bucket sort does not have comparisons and uses the index to sort. ese indexes are not just obtained from any mathematical function but are obtained in such a way that they could satisfy the order of arrangement of numbers inserted in the bucket. e procedure is explained in Algorithm 14.
e complexity for bucket sort would be O(n) for all the three cases.

Tim Sort.
It is a combination of insertion and merge sort. It is a stable and efficient sorting technique in which the list or record is split into blocks called "run." If the size of list or record is less than run, then it can be sorted using insertion sort. e maximum size of the run would be 64 depending upon the list or record. But if the size of the unsorted list is very large, then both insertion and merge sorting techniques are used. e complexity for best, average, and worst cases is O(n log n).

Even-Odd/Brick Sort.
It is another extension of bubble sort, in which the BS algorithm is partitioned into two parts: even part and odd part. Even part consists of even numbers, and odd part consists of odd numbers. Both the parts are executed one by one, and at last, the combined result is obtained and the records are sorted. It is a stable algorithm with the same complexity as the bubble sort: 2.19. Bitonic Sort. It is introduced through the concept of merge sort. In bitonic sort, we move the list to level L − 1 with two parts: left part and right part; the left part is settled in an increasing order, and the right part is settled in a decreasing order [2]. ese parts are merged, moved to level L, and then sorted to form the sorted sequence. e complexity for bitonic sort is O(n 2 ) on best, worst, and average cases. (

Literature Review and Related Work
Ali [3] discussed the number of sorting algorithms in this paper. Evaluation of time complexity, stability, and in-place nature is done. Ali found their running time in the virtual and real environment and suggested where to use a particular algorithm so that the result obtained is efficient, and Ali concluded that quick sort is a better option for sorting in the average-case scenario; counting, bucket, and radix sorts are efficient for smaller size of the list/record and on integertype data.
Hammad [4] compared three sorting algorithms, namely, bubble, selection, and gnome sorts, based on their average running time in this paper. Hammad took the number of readings to find the running time and concluded that whenever the record list is sorted, gnome sort appears as the fastest sorting algorithm, but when the list or record is unsorted, gnome sort took the same running time as bubble sort or selection sort has in their worst or average case (i.e. O(n 2 )).
Elkahlout and Maghari [5] discussed the two advanced versions of bubble sort, namely, comb and cocktail sort, and one linear-time technique, namely, counting sort, in this paper. Comparing these techniques, they concluded that cocktail sort performs better on average evaluation of their process time.
All these algorithms are graphically implemented in this paper. e main focus is on time complexity.
Jehad and Rami [6] made some changes in the bubble sort and selection sort so as to reduce the number of swaps during sorting operation. In this paper, the author follows the same procedure, compares the enhanced version of bubble sort and selection sort with the original bubble sort and selection sort, and then reduces the execution time. e complexity of enhanced bubble sort is reduced from O(n 2 ) to O(n log n) and remains the same as that of selection sort. Pankaj [7] using the C programming language compared five sorting techniques, namely, bubble, selection, quick, merge, and insertion sort, on the basis of average running time. e execution time is calculated in microseconds, and he concluded that quick sort is a better option for sorting the number of elements from 10 to 10000; the paper graphically represents the average running time of each algorithm.
Khalid et al. [8] proposed a new algorithm, namely, grouping comparison sort, which is then compared with the traditional sorting technique. is proposed algorithm has a limitation of having an input size of 25000 elements. As the input size increases, the results become horrible. All these above-discussed papers use the traditional sorting algorithms and compare them to get the average running time and conclude which of them is better to use.

SA Sorting
Starting from the left extreme end of the list/record, the first element is obtained as the target. e target is compared with all the other elements until the smaller element is found. e target is swapped with the smaller element and continuously compared till the extreme right end of the list. Again, going back to the target position, a new target element is taken at that position. Similarly, it is processed in the same way. e position is not changed until the targeted element is found as is already operated. When the targeted element is found to be already operated, the position is changed to next by 1. In this way, SA sorting works. e step-by-step process of SA sorting is given in Algorithm 15. e number of comparisons, C, for SA sorting is given as follows: where S is the number of swaps. In the best case, S � 0, so C � (n(n − 1)) 2 , Let T (n) � [(n(n − 1))/2], T (1) � 0, T (2) � 1, and T (3) � 3. Now creating the recurrence relation, Using induction to prove, At n � 3, T (3) � 9T (3/3) + 3 � 9T (1) + 3 � 3 At n � 6, T (6) � 9T (6/3) + 3 � 9T (2) + 6 � 15 At n � 9, T (9) � 9T (9/3) + 3 � 9T (3) + 9 � 36, and so on.
Solving using the master method, a � 9, b � 3, f (n) � n, and n log b a � n log 3 9 � n 2 > n; thus,  Journal of Computer Networks and Communications However, f(n) is a polynomial smaller than n log b a . SA sorting can be optimized in the future, but in comparison with the optimized quick sort and merge sort, it performs better on the already sorted list. For worst and average cases, complexity is O(n 2 ) + S. Since S is included in C, it can be neglected; thus, for all the three cases, it is O(n 2 ).

Results
e proposed sorting technique is implemented in C++ and tested with different numbers of elements. e performance of the SA sorting is measured in terms of execution time and memory required for sorting. e comparison of execution time and memory used by existing sorting techniques with SA sorting is shown in Tables 1 and 2. As moving from lower dataset to higher with sorted nature of the dataset, we found that SA sorting improves and performs better. Discussing about memory requirement from lower dataset to higher, only slight change can be seen.

Conclusion
While implementing all these sorting techniques and comparing them with SA sorting, the following points are concluded: (1) If we increase the space, the time reduces as shell sort and heap sort do (2) e sorting techniques which work well on unsorted records are not very good on sorted records as quick sort and merge sort do (3) In the worst-case scenario, most of the sorting techniques rely on O(n 2 ) as SA sorting does (4) No sorting technique is universally used, and its usage depends upon their nature and users requirement (5) SA sorting needs to be improved and optimized in the future

Data Availability
Our article is purely based on algorithm design. We evolve our results from unsorted and sorted data files which include different records, in order to compare the algorithm with already established algorithms. us, no such proper data have been used.

Conflicts of Interest
e authors declare that they have no conflicts of interest.