Fast Algorithm of Truncated Burrows-Wheeler Transform Coding for Data Compression of Sensors

Lots of sensors in the IoT (Internet of things) may generate massive data, which will challenge the limited sensor storage and network bandwidth. So the study of big data compression is very useful in the field of sensors. In practice, BWT (Burrows-Wheeler transform) can gain good compression results for some kinds of data, but the traditional BWT algorithms are neither concise nor fast enough for the hardware of sensors, which will limit the BWT block size in a very small and incompetent scale. To solve this problem, this paper presents a fast algorithm of truncated BWT named “CZ-BWT algorithm” and implements it in the shareware named “ComZip.” CZ-BWT supports the BWT block up to 2GB (or larger) and uses the bucket sort. It is very fast with the time complexity O(N) and fits the big data compression. The experiment results indicate that ComZip with the CZ-BWT filter is obviously faster than bzip2, and it can obtain better compression ratio than bzip2 and p7zip in some conditions. In addition, CZ-BWT is more concise than current BWT with SA (suffix array) sorts and fits the hardware BWT implementation of sensors.


Introduction
With the rapid expansion of IoT (Internet of things), lots of sensors are available in various fields, which may generate massive data. Meanwhile, the storage capacity of sensors and network bandwidth are limited, especially in a WSN (wireless sensor network). GBs or TBs of big data in IoT make enormous challenges to the sensors.
Data compression is a smart way to reduce the storage usage and speed up the network transportation. In addition, BWT (Burrows-Wheeler transform [1]) can gain good compression results for some kinds of data. For example, there are a lot of lightweight sensors in a zone of WSN to obtain the temperature data, and most of the data are similar. Thus, a practical way is using some high-performance nodes in this WSN to gather these data, use BWT to compress them and transmit them to the back end cloud platform.
BWT is also valuable in the field of bioinformatics. For example, the big genome data need compression and index, and BWT is an effective way [2,3]. The DNA data are special and fit the BWT compression. Although we cannot simply compare the bioinformation software such as BWA (Burrows-Wheeler Aligner) and the universal compression software such as bzip2, analyzing their BWT algorithms is meaningful.
But a practical problem is the speed of BWT for sensors and big data. High compression speed is important because the sensors have to treat GBs of data or more, while the traditional BWT algorithms are neither concise nor fast enough, which will limit the BWT block size in a very small and incompetent scale. In our previous paper, we have discussed the traditional compression software bzip2 [4]. Its BWT block size is not more than 900 KB, which will limit the compression ratio. Although it is not large enough to deal with the big data, enlarging the BWT block will observably decelerate the compression. The primary reason is the computing consumption of the traditional BWT algorithms. Besides, the hardware performance and energy consumption of the sensors are limited, which makes it difficult to increase the BWT block size for the big data compression.
We have designed a combined parallel algorithm named "CZ algorithm" to compress and encrypt the big data efficiently and developed our compression software named "ComZip" [5]. Now we have made ComZip compatible to Linux platforms. As mentioned in the figures of [4,5], ComZip has a BWT filter. This paper focuses on the BWT filter and proposes a fast algorithm of truncated BWT named "CZ-BWT algorithm" to compress the big data efficiently. CZ-BWT algorithm has the following features: (1) It uses truncated BWT to simplify the algorithm and gain the good performance.
(2) It uses bucket sort to speed up the BWT encoding and decoding with time complexity O(N), so that the BWT block size can rise to 2 GB or more to fit the big data compression.
(3) It can simplify the hardware design of the BWT filter, so that the sensors may use hardware to accelerate the BWT compression.
We did some experiments on both platforms x86/64 and ARM (advanced RISC machines) to compare the efficiencies of data compression among ComZip with/without CZ-BWT, bzip2, and p7zip. The experiment results indicate that Com-Zip with CZ-BWT filter is obviously faster than bzip2, and it can obtain better compression ratio than bzip2, p7zip, and ComZip itself without CZ-BWT filter in some conditions. In addition, the algorithm analysis infers that CZ-BWT is more concise than current BWT with SA (suffix array) sorts and fits the hardware BWT implementation of sensors.
To make further experiments, we provide 2 versions of ComZip in the website: for Ubuntu Linux (x86/64 platform) and Raspbian (ARM platform). The researchers may download them from http://www.28x28.com/doc/cz_bwt.html.
The remainder of this paper is structured as follows: Section 2 expresses the problems of BWT for sensors and big data compression. Section 3 introduces the algorithm of CZ-BWT encoding and decoding. Section 4 analyzes the complexities of CZ-BWT algorithm. The experiment results are given in Section 5. The conclusions and future work are given in Section 6.

Problems of BWT for Sensors and Big Data Compression
Numerous sensors in IoT can generate big data, but the bottlenecks of data transportation, storage, and computation in the networks of sensors need to be eliminated. Data compression meets this requirement. Figure 1 shows a typical scene in a WSN with both lightweight and heavy nodes, where BWT is feasible. This WSN has lots of lightweight nodes to sense the situation and generate massive data. Since they have limited energy, storing capacity, and computing resources, they cannot keep the data or achieve the long distance transportation, while a few heavy nodes in the WSN can gather and compress the data and then transport them to the backend cloud platform. The cloud platform has plenty of resources to store, decompress, and analyze the data.
We have discussed the big data compression and encryption in the heavy nodes in a WSN in [5], but if the heavy nodes use BWT, we still have the following problems: (1) How can the BWT block be enlarged without rapid decrease of the encoding/decoding speed?
(2) Can we design simplified hardware BWT filters for the sensors?
A larger BWT block can gain better compression ratio. In this paper and the previous [4,5], we use the same definition of the compression ratio as follows: D zip and D are the volumes of the compressed and original data, respectively. If the original data are not compressed, R = 0. If the compressed data are larger than the original data, R < 0. Always R < 1.
Facing GBs or TBs of big data, a small block of 900 KB cannot show the power of BWT. But enlarging the block will cause the performance bottleneck. As the analyses in Section 4 reveal, BWT encoding speed depends on the string sorting algorithm, and traditional BWT encoding has the time complexity O(N 2 lbN). N is the block size. If we change the block from 900 KB to 60 MB without any optimization, the encoding will become very slow. This is the first problem.
Although the hardware development improves the performance of the heavy sensors, it is still a challenge for the sensors to achieve fast BWT encoding/decoding. For example, ARM platforms have multicore CPUs with low energy consumption, and the current flash memory has enough capacity and good performance to support a large BWT block, but a practical BWT filter must be fast enough. This is the reason we consider making hardware BWT filters for the sensors.
The problem is that the complex traditional BWT algorithms bring difficulties to the hardware design. If a hardware BWT filter is very complex, its performance will be limited and its energy consumption will be high, and then it is unfit for the sensors.
To solve the problems, we need to review the main related works around sensors and big data compression.
In [5], we have discussed that current mathematic models and methods of lossless compression can be divided into 3 classes: (1) The compression based on the probabilities and statistics (2) The compression based on the dictionary indexes (3) The compression based on the order and repeat of the symbols; BWT belongs to this class 2 Journal of Sensors Current popular compression softwares are comprehensive applications of the above basic classes, and they have different features, which determine their compression ratio and speed. Especially, to compress big data in the sensors, we have 2 requirements: (1) Compression speed: fast enough. Since the hardware performance of a sensor is limited, the speed is very important. Too slow softwares such as PAQ and WinUDA are unfit for the big data.
(2) Compression ratio: high. A large data window with good algorithms can benefit the compression ratio. The softwares with too small data windows such as WinZip (512 KB), WinRAR (2 MB), gzip (32 KB), and bzip2 (900 KB block) are unfit for the big data.
In [4,5], we have developed and updated the compression software ComZip, and in this paper, we developed its Linux version, so that it can run in some sensors such as ARM platforms. ComZip uses all the 3 compression classes: (1) In class 1, ComZip uses the arithmetic coding [6] and PPM (partial prediction match) algorithm [7], which can gain pretty good compression ratio.
(2) In class 2, LZ77 algorithm [8] is used, which has the advantage of speed.
(3) In class 3, BWT is used, which is the focus in this paper.
To solve the problem of BWT encoding/decoding speed, a lot of algorithms have been developed. Current string sorting algorithms for BWT can reach the speed of linear time complex O(N), for example, some algorithms using SA (suffix array) [9] such as the 3 most popular linear-time algorithms: KS [10], KA [11], and SA-IS [12]. Among them, SA-IS is currently the best algorithm in the speed. Moreover, the further optimization of SA-IS algorithm is studied [13], and the first linear nonrecursive algorithm named GSACA is a new approach for the future [14].
Although current algorithms with SA are faster than the traditional BWT algorithms, it is not so easy to apply them directly to the sensors with the hardware and energy limits. Considering the large BWT block for the big data, the memory requirement of the SA construction is many times of the block. Meanwhile, if we try to design a hardware BWT filter for the sensors, we will meet the complexity of the algorithms such as the recursive computation in SA-IS [13]. GSACA is nonrecursive, but currently, it is slower than SA-IS, and its memory consumption is quite large [14], which are weaknesses for the limited computing resources of the sensors.
Parallel algorithms and the hardware design of BWT are also studied to improve the speed, including the parallel architecture [3,15] and the practical hardware acceleration, for example, FPGA (field-programmable gate array) [16] and GPU (graphic processing unit) [17]. The advancement is that parallel algorithms benefit the hardware BWT performance, and the researchers tend to simplify the hardware design so that they can obtain higher speeds [3,17], but the algorithms such as SA-IS are still complex for the sensors. Thus, finding faster and simpler BWT algorithms is useful.
In [3], a limited SA length k is brought into the string sorting, which can reduce the computation. We call this method "truncated BWT." We also use truncated BWT in this paper, but the limited length is different because we do

CZ-BWT Encoding and Decoding
3.1. Concepts of CZ-BWT. The compression software ComZip uses the parallel pipeline named "CZ pipeline" and the truncated BWT named "CZ-BWT." We have introduced the framework of CZ encoding pipeline in [5], and the reverse framework is CZ decoding pipeline. Figure 2 is the same encoding framework, and the only difference is the alternative BWT filter in use. CZ-BWT is working in the BWT filter.
CZ-BWT combines the following methods to improve the performance and simplify the algorithm design: (1) CZ-BWT uses truncated string sorting instead of SA sorting.
As shown in the first figure of [16], the principle of BWT is sorting the data to fit the compression. Sorting is the primary computation in BWT, which determines the performance. We use the same example as that in [16] to explain the truncated string sorting in CZ-BWT. Figure 3 shows the matrices for BWT sorting. The block size N = 8. As shown in (a), the traditional BWT uses full string sorting, which needs comparing of entire strings, for example, Row 0 "XYZAACOL" and Row 1 "YZAACOLX." The sorting result of (a) is shown in (b). Column 0 "AACLOXYZ" is the sorted string, and Column 7 "ZAAOCLXY" is the BWT output string. As shown in (c), the SA sorting ought to compare the suffixes of the same string, e.g., Row 0 "XYZAACOL" and Row 1 "YZAACOL," but the SA algorithms have been optimized to avoid such slow comparison [9][10][11][12][13][14]. As shown in (d), the truncated string sorting only compares short strings with length k < N, for example, Row 0 "XYZ" and Row 1 "YZA" with k = 3.
Figure 3(c) shows the SA sorting. Because the common SA sorting result is not always the same as the initial BWT result [1] shown in (b), a special ending symbol "$" is attached to the string tail in order to bridge the gap of the different results. This special symbol has a smaller code (e.g., −1) than any 8b binary code (0 … 255), which means the SA sorting algorithm needs special treatments for the ending symbol besides the normal 8b charset.
Figure 3(e) shows the truncated string sorting result of (d). In this example, (b) and (e) are the same, but in practice, if 2 truncated strings are the same, for example, "ABC" compares to "ABC," the sorting result depends on their original positions. So the decoding algorithm of CZ-BWT is different from that of common BWT.
(2) CZ-BWT sorts simple integers instead of strings, and it reverses the character sorting sequence indeed. Figure 4 shows different types of data comparisons. Truncated string comparing, for example, "XYZ" and "ACO" in (a) can be changed into simple integer comparing if we regard "XYZ" as a 24b integer. A 64b integer can substitute a truncated string with length k < 9, but in most of the platforms, for example, x86/64 and ARM, the LSB (least significant byte) is in the front, so the sorting sequence of the characters is reversed. As shown in (b), "Z" is the MSB (most significant byte) of the integer "XYZ," so its actual sorting result will be the same as reverse string sorting in (c).   Journal of Sensors (3) CZ-BWT uses bucket sorting instead of traditional merging or comparing based sorting.
Since the sting sorting is changed into integer sorting, CZ-BWT can use bucket sorting. When we use truncated strings with k = 3, 256 3 buckets are needed. If the memory is sufficient, k = 4 is feasible and 256 4 buckets are needed. Both encoding and decoding in CZ-BWT use bucket sorting.

CZ-BWT Encoding.
We use another example with BWT block length N > 1 KB. There are 2 phases in CZ-BWT encoding: Phase 1 (building the bucket sorting links). We assume the BWT block data is a "cycle" string s 0 … N − 1 , which has the following feature: And s i − 2 … i is a 24b integer (i = 0, 1, … , N − 1). Then we build the bucket sorting links on s. Figure 5 shows the example of 2 links: "ZYX" and "OCA." The bucket array has 256 3 link headers, and all links have the same end: null Output data Start position Original data 7 0 1 2 3 4 5 6

Journal of Sensors
pointer. We define the structure of the links and their headers as follows: Phase 2 (outputting the sorted data). We follow each link to output the data. Figure 6 shows the example of the link "OCA," which will output the characters "MA," referring to Figure 3(e). And finally we output the start position of the block for CZ-BWT decoding.
Algorithm 1 shows the CZ-BWT encoding algorithm.
3.3. CZ-BWT Decoding. We use the same example as shown in Figure 3, but the string is reversed into "LOCAAZYX" because of the MSB/LSB in the integer sorting. Figure 7(a) shows the full decoding matrix of CZ-BWT, which corresponds to Figure 3(e). And we ought to pay attention to the column numbers of this matrix: Row 6 is the reversed string "YZAACOLX," and Row 5 is the original data, reversed string "XYZAACOL." Recovering the whole decoding matrix is not necessary. Because CZ-BWT uses truncated data sorting, its decoding is different from that of the general BWT. As shown in Figure 7, there are 4 phases in CZ-BWT decoding: Phase 1 (building the second column of the matrix). As shown in Figure 7(b), this phase is the 8b integer bucket sorting. The second column is Column 7, and Column 0 stores the input data, which are the output data of CZ-BWT encoding. The bucket array has 2 8 counters, so that we can scan Column 0 once and write the sorted data to Column 7.
Phase 2 (building the third column of the matrix). As shown in Figure 7(c), this phase is the 16b integer bucket sorting. The bucket array has 2 16 counters, so that we can scan Column [0,7] once and write the sorted data to Column [7,6]. Because the amount of the 16b integers is related to the previous 8b integers, Column 7 will be the same as that in phase 1. Thus, we can write Column 6 only.
Phase 3 (building the forth column of the matrix). As shown in Figure 7(d), this phase is the 24b integer bucket sorting. The bucket array has 2 24 counters, and we can scan Column [0,7,6] once and write the sorted data to Column [7,6,5]. But this time, we need not write Column 5, because the link headers in phase 4 have the same 24b sorting effect already. Hiding the writing back operation can simplify this algorithm and improve the decoding speed.
Phase 4 (outputting along the bucket sorting links). As shown in Figure 7(e), this phase is the outputting of the decoded block. We can easily change the bucket array from data counters into link headers by accumulating the counter values, because Column [7,6,5] is sorted, and the current link header position adds that the current counter value is the next link header. For example, in Figures 7(d) and 7(e), we focus on Column [7,6,5] and notice header "LXY" + counter "LXY" = 3 + 1 = 4 = header "OLX" 5 According to the "cycle" string s 0 … N − 1 as shown in (2), we define the data counters in Figure 7 as follows: Then we can get the link headers in Figure 7(e) from the data counters in Figure 7(d) as follows:     ...  function encode(s) { / * s is the BWT block data string (original data) * / link = array(0 … N − 1); / * N is the length of string s * / bucket = array(0…256 3 -1); / * bucket stores the link headers of (3) * / for j = 0…256 3

Journal of Sensors
Here is a trick for the algorithm optimization. The exact value of a header ought to be 1 smaller than that in (7). For example, according to (7), header ″ LXY ″ = 4, while in (5), header ″ LXY ″ = 3 exactly. Now we explain this trick: There are 256 3 links in Column [7,6,5] in Figure 7(e), and their 256 3 headers are dynamic. In this phase, the headers are calculated with (7) at first, and then each output character of string s will cause that a corresponding header switches to the next link node position: When a counter gets a value larger than 1, for example, in Figures 5 and 6counter ″ ACO ″ = 2 (the string is reversed), the dynamic header is useful to determine which is the current link node position. The next position of the same link is easily calculated with (8) because Column [7,6,5] in Figure 7(e) are already sorted.
The trick can save the algorithm operations: Since each time we have to fetch a header value to locate the position, and decrease the value with (8) for the future fetch, we may simply merge the "fetch" and "decrease" operations. So long as the initial value of each header in (7) is 1 larger than the exact value, we can use the "fetch & decrease" operation each time.
Algorithm 2 shows the CZ-BWT decoding algorithm. We use 2 bucket arrays to mix the phases and save the time of data accessing.   Journal of Sensors In this example, we use the reversed string "XYXYXCOL" as the original data, which has two matches of the characters "XYX." Figure 8 shows the CZ-BWT encoding of Algorithm 1. We can find the relationship between Figures 3(d) and 3(e) and Figures 8(a) and 8(c). And in Figure 8(a), characters "XYX" in positions 1 and 2 are sorted. As a result, Figure 8(b) shows the unchanged "XYX" position sequence. The similar situation is in Figure 6. Figures 9 and 10 show decoding phases 1 to 4 according to Figure 7. As the phases and key operations are described in Algorithm 2, we can see the data changes from Figures 9  and 10, so that we can follow the process of decoding the reversed string "XOCYLYXX." In Figure 9(a), the input data "XOCYLYXX" are counted and then sorted in Figure 9(b). This is a typical bucket sorting with 256 counters. And the bucket sorting proceeds again in Figures 9(c) and 9(d), with 256 2 counters. The array link stores the sorting results.
In Figure 9(e), the data are counted with 256 3 counters, but the sorting is hidden in the calculation of link headers in Figure 9(f). Thus, the link does not store Column 5 indeed. Figure 10 shows the data output process in phase 4. The practical output operation in Algorithm 2 is using a string s to store the output characters. Figures 10(a) and 10(b) give the example of outputting s[0] and s [1].
The steps go on repeating until we gain the full-length s = ″LOCXYXYX,″ which are reversed. As mentioned in Section 3.3, CZ-BWT decoding outputs reversed data. As CZ-BWT encoding also outputs reversed data through the backward links in Figures 5 and 6, this decoding algorithm can reverse the data again and finally gain the original data.
As shown in Figures 10(c) and 10(d), the decoding algorithm can maintain the correct values of the dynamic headers, for example, header["XYX"], which keeps the proper order of the character outputs.

Analyses of the CZ-BWT Algorithm
Quite a few recent advancements of BWT algorithms are driven by the rapid development of genome information technologies [2,3,13,18], and there are many DNA softwares using BWT, including DNA compression, alignment, sequencing, and indexing. Due to the difference between the DNA and common data charsets, we cannot proceed direct experiments to compare a DNA software such as BWA (Burrows-Wheeler aligner) with a universal compression software such as ComZip or bzip2, but we can analyze their BWT algorithms to investigate their advantages and shortcomings.

Time Complexities.
We may study the BWT encoding and decoding algorithms by analyzing their time and space complexities in the worst cases. First, it is known that the traditional BWT encoding algorithm has the time complexity O (N 2 lbN). N is the block size. The analyses are as follows: According to the principle of BWT compression [1], the key computation of BWT encoding is the string sorting, which determines the encoding speed. And the string sorting consists of 2 algorithms: (1) The comparison of 2 strings: The length of each string is equal to the BWT block size N, so this string comparison has the time complexity O(N).
(2) The sorting of data elements: In BWT encoding, a data element is a string, and the amount of the strings is equal to the block size N. Some traditional sorting algorithms such as quick sort and heap sort have the time complexity O(NlbN), and it has been proven that O(NlbN) is the fastest level in all comparisonbased algorithms.
From the above 2 algorithms, we find that the fastest traditional BWT encoding has the time complexity O(N 2 lbN). It is not fast enough. As mentioned in Section 2, the current well-known fastest string sorting algorithm is SA-IS, which has the time complexity O(N). So we compare CZ-BWT encoding which is used in ComZip and the SA-IS encoding which is used in BWA. (2) Fetch and decrease (a) COL : 0 + 1 LXY : 0 + 1 OLX : 1 + 1 XCO : 3 + 1 XYX : 5 + 1 YXC : 6 + 1 YXY : 7 + 1 Others: ignore bucket_A Link headers 0 7 6    According to Algorithm 1, we find that the bucket sorting also has the time complexity O(N), but we can compare more details.
SA-IS requires special ending symbol for the block and recursive reduction to a shorter string [2,13], which are more complex than CZ-BWT. SA-IS needs to scan the BWT block for more than 3 times, while according to Algorithm 1, CZ-BWT encoding just scan the block twice in phases 1 and 2. Thus, CZ-BWT encoding is faster indeed.
GSACA also requires special ending symbol. It is nonrecursive, and it also has 2 phases [14], but each phase has much more operations than simply scanning the block in CZ-BWT encoding. These operations makes GSACA slower than SA-IS and CZ-BWT currently.

Space Complexities.
The memory usage is important for the sensors. SA-IS, GSACA, and CZ-BWT have the same space complexity O(N). In detail, BWA uses the RAM (random access memory) of 5.37N; GSACA uses 12N besides 4N for the suffix array, and CZ-BWT uses 4N to store the links for the block size up to 2 GB. Moreover, it is easy for CZ-BWT to use 5N for the block size of up to 512 GB. In this view, CZ-BWT needs less memory for the block than BWA with SA-IS and the GSACA program.
But CZ-BWT requires extra RAM for the bucket array. If the block size is not more than 2 GB, a bucket counter uses 4 B, and a bucket array has 256 k elements. If k = 3, a bucket array uses 64 MB, which is feasible in a heavy sensor node. And if k = 4, it needs 16 GB, which is feasible in the current cloud platforms.

Complexities of Hardware Design.
Hardware acceleration is valuable for the sensors which have limited computing resources. Due to the complexity of SA-IS, it is difficult to implement the hardware BWT with SA-IS. As a contrast, CZ-BWT is simpler and easier for the hardware acceleration.
The figures in [16] show that the truncated BWT fits the hardware design, but it uses merge sort, which has the time complexity O (N 2 lbN). CZ-BWT uses bucket sort, which is both fast and easy for the hardware design. Figure 11 shows the primary hardware design of bucket sort. We take Figure 11   And this RAM can provide the data, for example, "LXY," to the address bus of the bucket array RAM. Then, the latter RAM provides the current counter value to the ALU (arithmetic and logic unit), which will update the value and write it back to the RAM. We can use simple sequencecontrol logic circuits to make this module work. In the view of hardware, both (a) and (b) in Figure 11 are succinct and easy to optimize the hardware speed.

Weaknesses.
As a truncated BWT, CZ-BWT cannot use the standard BWT decoding, which can be used by the BWT with SA-IS. Both CZ-BWT and the standard BWT decoding have the time complexity O(N), but the latter is faster. According to Algorithm 2, CZ-BWT decoding has to scan the block for 4 times from phases 1 to 4, while the standard BWT decoding can scan the block only twice.
This weakness is acceptable for the sensors. Although CZ-BWT decoding is slower than the standard BWT, its speed is still in the linear level. And it has no complex implementation such as the special ending symbol and the recursive algorithm, so the hardware acceleration for CZ-BWT can be used to the sensors in a relatively easy way. Moreover, the typical scene in Figure 1 infers that most of the BWT decoding events occur in the cloud platform, which has plenty of computation resources. So the speed of CZ-BWT decoding is fast enough in this case.
Another weakness is that the truncated BWT has lower compression ratios than the standard BWT. But we can use a larger block in CZ-BWT to keep up with the compression ratio. The experiment results show this accomplishment. Table 1 shows the comparison of typical BWT algorithms. We ought to distinguish the concepts of BWT: CZ-BWT is a kind of truncated BWT, while bzip2, SA-IS, and GSACA use standard BWT, but bzip2 uses the traditional BWT, which is slow.

Experimental Results
We have done some experiments to compare ComZip, WinRAR, and 7-zip in [5]. The results indicate that ComZip with a large data window has better compression ratio than WinRAR and 7-zip in most cases, and its compression speed    Table 2. 12 Journal of Sensors is faster than 7-zip. But those experiments do not use the BWT filter, which has CZ-BWT algorithms for ComZip.
In this paper, we compare the following softwares in the experiments: ComZip with CZ-BWT, bzip2, ComZip without CZ-BWT, and p7zip (7-zip for Linux). When we test ComZip without CZ-BWT, we observe its data window size. When we test ComZip with CZ-BWT, we observe its block size and use a fixed 4 MB data window for its LZ77 algorithm [8]. We choose a small data window of 4 MB to extrude the abilities of CZ-BWT, and a data window smaller than 4 MB may reduce the performance of the BWT filter.
The experiments in this paper are on 2 hardware platforms: x86/64 and ARM. Their performances may provide references to the future and current heavy sensor nodes. The operating systems of both experiment platforms are Linux. We have developed ComZip for Linux, and we still provide ComZip in the website. Researchers may use it to do more experiments with new data. It can be downloaded from http://www.28x28.com/doc/cz_bwt.html.

Tests on the x86/64
Platform. This platform is a common laptop with the following equipments: Intel Core i7-4700MQ 4-core & 8-thread CPU, 16 GB DDR3 RAM, and 128 GB SSD (Solid State Disk) and Ubuntu Linux 12.10 (x64). We regard this laptop as a future high-end mobile sensor when the fuel cell can provide enough energy. The software versions are ComZip v20171019 (64b), bzip2 1.0.6, and p7zip 9.18.
In this experiment, we use different data windows or block sizes to compress the same original file named "book.htm," which is an example that some kinds of data can show the advantage of BWT in the compression ratio. This is a real Chinese bookshop data file of storage records in HTML/XML format. Its original length is 346,499,594 B. It can be downloaded from http://www.28x28.com/doc/ book.htm.bz2. Table 2 and Figure 12 show the relationship of the compressed file size and the data window/block size. From (1), we can find that this relationship is virtually the relationship of the compression ratio and the window size.     Table 3. 13 Journal of Sensors Table 3 and Figure 13 show the relationship of the compression/decompression time and the data window/ block size. Figure 13 hides the decompression time because this weakness of CZ-BWT is analyzed in Section 4. We focus on the compression performance first, and the optimization of decompression for ComZip is our future work.
In Table 2 and Figure 12, we observe that ComZip with CZ-BWT has the best compression ratio among these softwares, and p7zip has the worst except the 0.1 MB block of bzip2. The 0.1 MB block is too small for the big data compression. If the block is large enough, the standard BWT in bzip2 has better compression ratio than truncated BWT indeed. When bzip2 uses 0.9 MB block, ComZip has to use about 7 MB block to gain better compression ratio.
But enlarging the block for bzip2 is not practical. Table 3 and Figure 13 show that bzip2 has the slowest compression speed, and its curve raises rapidly, which can exhibit the analysis that traditional BWT has the time complexity O(N 2 lbN). We can estimate the speed of bzip2 with a 512 MB block.
According to the compression speed shown in Figure 13, ComZip with CZ-BWT is slower than p7zip, but their curves are close. The curve from 1 to 8 MB show that a block smaller than 8 MB may reduce the performance of CZ-BWT with 4 MB data window, and the curve from 8 to 512 MB can exhibit the analysis that CZ-BWT has the time complexity O(N). If we can find an universal compression software with SA-IS, we suppose its curve will like this one for CZ-BWT.
ComZip without CZ-BWT is much faster than others in Figure 13. We can provide 2 possible reasons. The first reason is the parallel CZ encoding pipeline, which is introduced in [5]. This platform with 8-thread CPU, large RAM, and SSD may release the good performance of the pipeline. The second reason is the data file for this experiment fits the optimized LZ77 algorithm, which is mentioned in [4], so the performance of ComZip is evident.
Above all, the experiment results on this x86/64 platform show that ComZip with CZ-BWT can have the best compression ratio among these softwares, and its compression speed is near p7zip, which is practical for the big data.

Tests on the ARM Platform.
This platform is a popular Raspberry Pi 2 Model B with the following equipments: ARM Cortex-A7 4-core CPU, 1 GB DDR RAM, 64 GB Micro  Figure 14: Compressed file size and data window/block size in Table 4. 14 Journal of Sensors SDXC (SD eXtended Capacity), and Raspbian Linux 7. We regard this Raspberry Pi as a current heavy node of mobile sensors, which is inexpensive. The software versions are ComZip v20171019 (32b), bzip2 1.0.6, and p7zip 9.20. In this experiment, we still use different data windows or block size to compress the same original file "book.htm." We can see the difference of the results between the platforms of x86/64 and ARM. Table 4 and Figure 14 show the relationship of the compressed file size and the data window/block size, and Table 5 and Figure 15 show the relationship of the compression/ decompression time and the data window/block size.
The only difference between Tables 2 and 4 is the size of the file compressed by ComZip. Even the data window or block size is the same; ComZip generates different compressed file. The reason is explained in [5]. ComZip is also a chaotic encryption software. If the same file is compressed by ComZip twice, we will get 2 thoroughly different compressed files. But the difference of the lengths is so tiny that the influence on the compression ratio can be ignored. This experiment is limited by the platform hardware, especially the 1 GB RAM. When the block size is enlarged to 64 MB, ComZip with CZ-BWT aborts for insufficient RAM. Both the bucket array and the operating system occupy extra RAM; thus, the total RAM capacity of 1 GB is inadequate. If the RAM is enlarged to 2 GB, we estimate that the workable block size may reach 300 MB. Figure 15 shows bzip2 is much slower than the others, and its curve also raises rapidly. ComZip with CZ-BWT is faster than p7zip when their data window/block size is between 2 and 8 MB and slower than p7zip in the other cases.
ComZip without CZ-BWT is also the fastest on this platform, but the 4-core CPU limits the performance of the parallel CZ encoding pipeline.
Above all, the experiment results on this ARM platform also show that the compression speed of ComZip with CZ-BWT is practical. Although the block size is limited by the RAM, ComZip with CZ-BWT has the best compression ratio among these softwares.    Table 5. 15 Journal of Sensors 5.3. Tests with Other Data. The compression ratio of BWT is not always better than the others without BWT. We find that only some kinds of special data fit the BWT compression well. This experiment uses the same x86/64 platform, but the data file is changed into "lamp.vdi," which is a real virtual machine image file of a Linux data partition. The original length of this file is 527,467,008 B. Table 6 shows the relationship of the compressed file size and the data window/block size, and Table 7 shows the relationship of the compression/decompression time and the data window/block size.
In Table 6, we observe that bzip2 has the lowest compression ratio among these softwares, and ComZip with CZ-BWT has the second lowest compression ratio. In Table 7, we observe that bzip2 and ComZip with CZ-BWT cannot be faster than p7zip and ComZip with CZ-BWT. Thus, the experiment results provide an example that some kinds of data cannot get better compression ratio and speed by using BWT. This paper focuses on the BWT algorithms. Researchers may use their own data to find what kind of data fit the BWT well.
From all of the above experiment results, we can get some support about the advantages of CZ-BWT: the compression ratio for some kinds of data, and the compression time contrasting to the other universal BWT compression software. And these results provide some references to the performance of CZ-BWT running on x86/64 and ARM platforms, which may infer the feasibilities and practicalities of using CZ-BWT in the future and current sensors.
But these results also reveal that BWT cannot always gain better compression ratio than other compression algorithms.
Thus, the BWT filter in ComZip remains alternative. And compared to the standard BWT, CZ-BWT has lower compression ratio, and its decompression is slower. So we regard the elimination of the weaknesses from CZ-BWT as our future work.

Conclusions and Future Work
The rapid expansion of IoT leads to numerous sensors, which generate massive data and bring the challenges of data transmission and storage. A valuable way for this requirement is data compression, and BWT can gain good compression ratios for some kinds of data, which can be used in the sensors.
But the problems of BWT in the sensors for big data still exist. Due to the limited computation resources of each sensor, enlarging the BWT block without the rapid decrease of the encoding/decoding speed is a problem. If the sensor needs hardware acceleration for BWT, simplifying the complex BWT to design the hardware is another problem.
To solve these problems, this paper presents CZ-BWT algorithm, a fast algorithm of truncated BWT using bucket sort. CZ-BWT is implemented in the shareware ComZip. It supports the BWT block up to 2 GB currently, and it is easy to support a larger block, which meets the requirements of big data compression.
The analyses indicate that CZ-BWT encoding has the time complexity O(N), and it's faster than the BWT encoding with SA-IS. The space complexity of CZ-BWT encoding is also O(N), and it uses less RAM than that with SA-IS, if the block size is large enough and the RAM for bucket array can be ignored. The primary hardware design of bucket sort   16 Journal of Sensors infers that the hardware acceleration for CZ-BWT is relatively easy to realize. The experiment results support that ComZip with CZ-BWT is obviously faster than bzip2, and it can obtain better compression ratio than bzip2, p7zip, and ComZip without CZ-BWT for some kinds of data. And these results provide references to the performance of CZ-BWT running on x86/64 and ARM platforms, which may infer that using CZ-BWT in the future and current sensors is feasible and practical.
On the other hand, these experiment results also provide the proofs of the weakness analyses in the CZ-BWT. Compared to the standard BWT, CZ-BWT has lower compression ratio, and its decompression is slower. How can the loss of the compression ratio be analyzed for the truncated BWT? Can we enhance the truncated BWT encoding to obtain better compression ratio? Can we change CZ-BWT into standard BWT and keep its advantages? Can we optimize the decompression algorithms of ComZip, especially the CZ-BWT, to get better speed? Solving these problems is the future work.