DATA COMPRESSION IN BLACK-GRAY-WHITE BARCODING

. ABSTRACT Context. In this paper the authors propose a method for data compression to be used for presenting information in the form of 2D matrix barcode. The proposed method is based on both a structural-logical approach and using three colors in a barcode instead of two colors as it is in standard black-and-white barcodes. This approach allows to increase data density keeping the same area as bi-color barcodes take. In the paper authors present the data compression method and demonstrate the barcoding technology. Objective. The goal of the work is to develop a method of data barcoding that would allow to encode more information in the form of 2D matrix barcode. Method. The method of tricolor matrix barcoding with compression is proposed. The main idea of the method is to compress input textual information on the stage of alphanumeric sequence transformation into a set of barcode patterns, which will form a resulting barcode symbol. It is possible due to intermediate transformation of input characters from initial notation, which is determined by cardinality of an input alphabet, to a notation defined by cardinality of barcode patterns alphabet. Choice of the input alphabet influences overall compression, and it is an important step of the method to choose the initial alphabets for the textual information to be encoded. Use of three colors over standard two colors is also an important component for creating a barcode symbol with increased informational density. As ternary notation is used, the second transformation from the intermediate notation to the ternary one provides more compression. The proposed method allows to represent more textual data in a single barcode symbol than bicolor barcoding approaches do. Results. The method of tricolor matrix barcoding with compression has been developed and described. Authors provided an example of the method implementation on test data that had been barcoded using the method. Conclusions. The experiments conducted for this research have confirmed that the proposed method provides more informational density as compared with black-and-white matrix barcodes. The prospects for further research might include studying noise immunity issue in order to guarantee error-free scanning and increased reliability of the barcode, and extending the barcoding software to be used in any alphabet.


ABBREVIATIONS
BGW-Code is a black-gray-white barcode; DNM is a decimal numbers mode; HNM is a hexadecimal numbers mode; TDM is a textual data mode; ASM is an ASCII symbols mode.

NOMENCLATURE
B is an area of barcode symbol; s is a number of cells in a barcode pattern; q is a number of colors, or number base; V is a maximum capacity of a barcode symbol;  is a symbolism (an alphabet) of barcode; inf  is an alphabet of informational (textual) symbols; aux  is an alphabet of auxiliary (technical) symbols; X P is a cardinality of any alphabet X; T is an input alphanumeric sequence; U is a resulting sequence of barcode patterns; n is a number of adjacent symbols of the same type in the input sequence T; m is a number of barcode patterns; i t is an element (one character) of the input sequence T; i w is a subsequence consisted of elements i t ; i  is a barcode pattern; z u is a subsequence of barcode patterns; inf ( ) ( ) INTRODUCTION One of the important problems in the field of modern information technologies is an issue of information support of automated relocation of objects (goods, freights, medical supplies, documents, parcels etc.) [1].Automated objects relocation systems are based on using automated identification that enables entering required data into a computer system by its automated scanning from the object.
High demand for automated identification is determined by the desire to improve control on objects relocation, reduce production costs, and increase its profitability and efficiency.
Among various types of automated identification, it is the barcoding technology [2,3] that has been widely distributed.It happens due to the low cost of both barcode patterns production and scanning equipment.
Barcoding is the way to represent and store information on a carrier using elementary discrete graphical shapes, such as circle, ellipse, square, rectangle, hatch (straight, oblique), triangle, polygon (hexagon, octagon) etc. Information is represented in the form of combinations of elements with different coloring.
Barcoding provides an optical way of information scanning, including distant scanning [4,5].A barcode is placed on an object surface, and it is moving along with the object throughout its trajectory.
Since the invention of barcoding (the first patent for the barcode was received in 1952), it has been more than 60 years, however barcoding is still considered as one of the advanced technologies.Moreover, a lot of experts believe that barcodes are among the most prominent discoveries of the 20th century.
There is a number of barcode types which can be divided into 3 main groups: linear, stack, and matrix.A few up to several dozens of alphanumeric codes can be represented in the form of linear barcode.Several hundred characters can be represented in the form of stack barcode.Up to several thousand symbols can be represented in the form of matrix barcode.The subject of this research is matrix barcoding.
Matrix barcode is a two-dimensional array of discrete graphic items combined as one image.The structure of such an array is called a barcode pattern (BC-pattern).The majority of matrix barcodes are black-and-white.However, in recent years there has been growing interest in the development of multicolor matrix barcodes.The most well-known among them are Microsoft's High Capacity Color Barcode [6] and High Capacity Color QR code [7].
Multicolor barcodes provide larger data density in comparison with black-and-white equivalents; however, they have a disadvantage, which is narrow scope of application.This is due to as yet higher cost of color printers compared to black-and-white printers, as well as the high cost of consumables for color printing.Therefore, multicolor barcodes are unable to replace black-and-white barcodes in all areas of applications, considering that blackand-white printing can be more efficient than a color one in certain use cases.
Since modern printing equipment, namely laser printers, provide a high black-and-white printing quality with the required resolution, we consider that it is appropriate to add one more color, the gray, representation of which is not a challenge using black-and-white printer.Thus, we propose to create black-gray-white barcode patterns (Fig. 1) with existing black-and-white printing equipment.As a result, it is expected that matrix barcode data density will be increased.
Subsequently, let us call such black-gray-white barcodes a BGW-Code.
However, to achieve high rate of data density, it is essential to apply a structural-logical approach for increasing barcodes density, in addition to the use of the third color.The object of study is the process of transforming textual information into a barcode.
The process of a barcode symbol creation becomes more complex because of procedure of data compressing, which is an important part of the method proposed in the paper.Therefore, the process of data barcoding consists of two stages: compression of input information, and transforming compressed data into a barcode symbol.
The subject of study is the methods for data compression and barcode construction.
The methods being developed by authors are aimed at providing a possibility to encode more data into a single barcode symbol.
The purpose of the work is to develop the method of forming barcode symbol with increased informational density.

PROBLEM STATEMENT
Requirements to a barcode as a way to store and input information are as follows: 1. Miniaturization of barcode pattern (limited area B is allocated for a barcode pattern i ω superimposing on an item).
2. Significantly increasing capacity V of a barcode pattern i ω without changing its geometrical dimensions.
3. Widening the concept of a barcode pattern in order to obtain portable data file (the barcode pattern has to contain not only an access key to information but complete information about the item).
Let us consider a problem of increasing data density of a matrix barcode.Factors of increasing data density could be both a number of colors q used in barcode and the use of specific methods for data compression, which enable increasing capacity V of a barcode pattern i ω .
In this research we consider tricolor barcode, i.e. 3 q = , in which black, white, and gray colors are used for elements representation.Such tricolor barcode patterns are easily produced by using an ordinary printer.
Increasing data density basically means increasing of a ratio between an initial data sequence needed to be encoded and a resulting sequence of barcode patterns, which represents by a compression coefficient U.
Therefore, the formal definition of the task to be solved in this research is as follows: Thus, in this paper we present the new method of alphanumeric data compression; these data are a subject of representation in the form of black-gray-white barcode.

REVIEW OF THE LITERATURE
Methods of information barcoding as well as barcodes themselves are the subject of research for many scientists.
In [8], the author presents a method to generate and decode two-dimensional color barcode consisted of several blocks, which are a black-and-white configuration block that encodes auxiliary information about the barcode itself and a set of color data blocks that encodes actual data.
In the patent [9] it is proposed to store information decoded from a barcode in a form of character-based data in an auxiliary field (e.g. a comment field).
The authors of [10] propose a new approach of decoding color barcode, which does not require a reference color palette.They describe an algorithm, in which groups of color bars are decoded at once, what is exploiting the fact that joint color changes can be represented by a lowdimensional space.
A prototype for generating and reading a HCC2D code format on both PC and mobile phones is presented in [11].The authors provide experimental results considering different operating scenarios and data densities in comparison with 2-dimensional barcodes.
The authors of [12] describe a method of high capacity color barcodes generation, which operates due to embedding independent data into two different printer colorant channels via halftone-dot orientation modulation.
In [13], an approach for localization and segmentation of a 2D color barcode when it is read using computer vision techniques is presented.The authors develop a progressive strategy to achieve high accuracy in diverse scenarios and computational efficiency.
The authors in [14] propose both a system and a method to encode and decode data in a color barcode pattern using dot orientation and color separability.They aver the method to be robust against interseparation misregistration with a small symbol error rate.
COBRA system, which is a visible light communication (VLC) system for off-the-shelf smartphones, is presented in [15].The proposed system is able to encode data into specially designed 2D color barcodes.To achieve it, the authors developed a new COBRA barcode optimized for streaming between small-size screen and low-speed camera of smartphones.
As presented, a lot of various solutions for barcoding exist, however there still are different relevant problems concerning data barcode representation improvement, which requires some new approaches in data compressing.

MATERIALS AND METHODS
A barcode symbol consists of barcode patterns.In its turn, a barcode pattern consists of elements, which are matrix cells on a carrier.Each cell can be either black, gray, or white.
We assume that maximum capacity of a barcode symbol equals V barcode patterns.In this case, 3 s V ≤ , where 3 is a number of colors and s is a number of cells in the barcode pattern (Fig. 2).As shown in Table 1, Let the set ASCII be presented as the following: where L is a letters set, D is a digits set, and C is a special symbols set.The sequence T divides into adjacent subsets that are consisted of elements belonged to one of ASCII subsets: where 1 2 ...
Practically, the transformation (2) means a transformation of n-digits number in a notation P A into m-digits number in a notation inf P  .
The transformation (2) will be with compression if  and at the same time,     u     with compression, it is necessary that the following condition is met: where ms is a number of tricolor cells on a carrier that represent the subsequence i w .Such a transformation is necessary for ensuring spacekeeping data representation on a carrier and increasing data density of barcode patterns with their fixed geometrical dimensions and at the same time, unchanging carrier size.
We define a degree of input data compression as a ratio of a length of ternary sequence that corresponds with alphanumeric sequence i w to a number of cells on a carrier that represents subsequence i w in barcoded form and refer to it as a compression coefficient: Thus, the data compression problem (3) consists in finding such P A with fixed inf P  , as well as parameters n and , so that maximum value of 4 EXPERIMENTS Let us consider s = 7, i.e. large BGW-Code will be considered, thereby achieving barcode symbols with capacity of 2187 barcode patterns (see Table 1).Thus, the symbolism of the barcode comprises 2187 tricolor barcode patterns, each of which consists of 7 cells.The barcode patterns correspond to a numeric set {0, 1, …, 2186}.
15 barcode patterns let be considered as auxiliary ones.In this case, 2172 barcode patterns remain for representing information.
The inequality system (3) for large BGW-Code is as follows: Now we need to solve the system (5) relatively to P A .Only integer values are considered as solutions, and the compression coefficient is calculated according to (4): . Such n and m values are sought for each P A , so as maximum compression coefficient will be achieved.Fig. 3 defines dependence of a compression coefficient on a cardinality P A .
The Table 2 below shows some integer solutions (P A , n, m) that provide compression of data when it is represented as a barcode.
If w i is the subsequence of decimal numbers, maximum compression with a compression coefficient 1.429 will be achieved when each 10-digits subsequence of adjacent decimal numbers corresponds with 3 barcode patterns.Practically, transformation "10" → "3" means that 10-digits decimal number transforms into 3-digits number in a notation 2172.
The alphabet P A = 28 that is composed of 16 hexadecimal numbers {0 -F} and 12 other random symbols, such as letters and special characters, provides a possibility to represent hexadecimal sequences, e.g.exe-files, in the form of barcode.In this regard, we should use the transformation "16" → "7" where each 16-digits subsequence corresponds with 7 barcode patterns, which means that 16-digits hexadecimal number is transformed into 7digits number in a notation 2172.With this transformation, 16-digits subsequence is compressed with a compression coefficient 1.306 (see Table 2).
The alphabet P A = 267 provides a possibility to represent an input information comprised of any ASCII(256) symbols in the form of barcode.In this case, the compression coefficient equals 1.179 and each subsequence consisted of 11 alphanumeric symbols corresponds with 8 barcode patterns, i.e. the transformation "11" → "8" means that 11-digits number in the notation 267 transforms into 8-digits number in the notation 2172.
From Fig. 3 and Table 2, we can assume that it is most appropriate to use the following 4 compression modes for an input alphanumeric sequence T (see Fig. 4): -DNM, Decimal Numbers Mode: PA = 10, the mode for compressing decimal subsequences, -HNM, Hexadecimal Numbers Mode: PA = 28, the mode for compressing hexadecimal subsequences, -TDM, Textual Data Mode: PA = 88, the mode for compressing alphanumeric data (overall number of symbols in the text shall not exceed 88), -ASM, ASCII Symbols Mode: PA = 267, the mode for compressing subsequences composed of ASCII(256) symbols.
To switch between the modes, switch symbols 2172 2181   that corresponds to appropriate auxiliary barcode patterns, the mode switchers, are used.For instance, the mode switcher 2172  corresponds to the barcode pattern number 2172 in the barcode symbolism and provides a transition from ASM to DNM.
Before a barcode image be plotting on a carrier, the input alphanumeric sequence (1), which shall be represented as a barcode, is reduced to the following form: In TDM, the transformation In ASM, the following transformation is performed: As a result of these transformations, the array of number from the range 0 ÷ 2171 is obtained instead of the input alphanumeric sequence T.Then, each number is replaced by the appropriate barcode pattern from the symbolism and is arranged on a carrier.The barcode symbol can have either square or rectangle shape.The method of compression of alphanumeric information, which is to be represented as a BGW-Code, is as follows: 1. Taking into account the parameter V, which is the capacity of a barcode symbol, the cardinality of the symbolism Ω is defined as P Ω = 3 s where ] [ 2 .Modes transmission rules shall be defined.
5. An appropriate alphabet is formed for each mode.6. Rules of partitioning input alphanumeric sequences to subsequences of adjacent symbols, which consists only of symbols of the appropriate mode alphabet, are formed.
7. Each obtained subsequence is processed by the rules of the appropriate mode and transformed into numeric form, which is a sequence of numbers from the range 0 ÷ inf P Ω -1.
The proposed method of data compression can be used for random symbol sequences input using a keyboard.
As a result of the analysis of the alphanumeric sequence (6), which is conducted by the appropriate software, the syntax analyzer, the following string is obtained: . In this string, 2 mode switchers were inserted: 2176 ω , to switch from TDM to DNM, and 2177 ω , to switch from DNM to TDM.Each 12-digits subsequence (there are 6 such subsequences) shall be replaced by 7 numbers (the transformation "12" → "7") from the range 0 ÷ 2171, and each 10digits subsequence consisted of decimal numbers shall be replaced by 3 numbers (the transformation "10" → "3") from the same range 0 ÷ 2171.
Eventually, a numeric sequence comprised of 47 numbers from the range 0 ÷ 2186 (as 2172 informational barcode patterns with numbers 0 ÷ 2171 and 15 auxiliary barcode patterns with numbers 2171 ÷ 2186 compose the symbolism Ω), including 2 mode switchers 2176 ω and 2177 ω , which correspond to numbers 2176 and 2177, is obtained as follows: To obtain a barcode symbol, each of 47 symbols shall correspond to a barcode pattern consisted of 7 tricolor cells (see Fig. 2).
Since a barcode symbol acquires the rectangle shape, one more barcode pattern shall be added to 47 patterns of the barcode: 2182 ω that represents Pad symbol, a placeholder.
Thus, the barcode symbol presented in Fig. 1 comprised of 336 tricolor cells, as 7 × 48 barcode patterns is equal to 336.The dimension of the barcode symbol is 16 × 21 cells.

DISCUSSION
Let us consider the obtained results in order to discuss efficiency of the method.
If the textual sequence (6) consisted of 82 symbols of the alphabet with cardinality P A = 88 is represented on a carrier as a black-and-white barcode image, it would require ] [ As a result of the use of both three colors and the proposed compression method, it takes 336 tricolor cells (see Fig. 1) to represent the textual sequence (6).
Thus, the proposed method provides data density with the compression coefficient 410/336 = 1.22.The total effect of the transition from two-color to tricolor image alongside using the compression method provides compression with the coefficient 574/336 = 1.708.
Increasing data density by 1.708 times is assured due to trichromatism (1.4) and the compression method (1.22).Indeed, 1.4 × 1.22 = 1.708.Thus, the multicolor barcoding method proposed in the paper allows to perceptibly increase amounts of information that can be stored in the form of barcode.In the example above, the data density effect is up to 70%.Depending on the area of application and a specific use case, it can ensure significant benefits, such as autonomous access to large amount of actual data, instead of keeping some general information with a link to more data, which is much more convenient, reliable, and in some cases, even more secure way to get information.

CONCLUSIONS
As barcodes are widely used in multiple fields of human activity, there still are various issues concerned with encoding information.And one of such problems is barcoding more data using the same area of barcode graphical representation.
The scientific novelty of obtained results is that the method of tricolor barcoding with compression is firstly proposed.Compression is achieved due to input data transformation into barcode patterns.The proposed transformation method provides transforming a subsequence of input characters into a shorter subsequence of barcode patterns which will form then the resulting barcode symbol.Use of three colors in the barcode ensures additional compression due to use of ternary notation.Combination of these two approaches allows to barcode more information using the same area of a barcode symbol then it would be with use of binary notation without compression.
The practical significance of the proposed method is that more textual information can be encoded in the form of a single barcode symbol.It can be successfully used in various practical applications when size of the overall barcode symbol is essential, especially when there are quite a lot of data to be barcoded.
Prospects for further research are to study noise immunity issue, which must be considered in order to guarantee error-free scanning and increase reliability of the barcode, and to extend the barcoding software to be used in any language, not only Latin and Cyrillic alphabets but also some specific alphabets, such as Korean, Georgian, Arabic etc., and hieroglyphics.

Figure 1 -
Figure 1 -An example of BGW-Code symbol max 3 s V = barcode patterns.A set of all possible barcode patterns with a fixed s forms the multiplicity, or the alphabet Ω of cardinality 3 s P Ω = .Let us call this alphabet a symbolism of barcode.The symbolism of barcode consists of informational patterns inf on a carrier, we use inf P Ω informational barcode patterns.Auxiliary patterns are used to switch between encoding modes, indicate START and STOP barcode patterns and setup a scanner.
t  is a subsequence of the input sequence which contains elements i t of only one set of ASCII subsets, namely L, D or C. The subsequences 1 2, , ..., k w w w can be situated in any order in the input sequence T.Let alphanumeric symbols i t belong to an alphabet A, which belongs to ASCII, i.e. i t  A, A ⊂ ASCII.Cardinality of the alphabet A is considered to be equal P A .The alphabet A corresponds with a numeric set {0, 1, …, P A -1} that represents numbers of the symbols of the alphabet A in the alphabet.Now we turn the subsequence 1 2 .. .
t  formed out of the symbols of the alphabet A into a barcode.In the barcode form, the subsequence 1 2 ... n t t t corresponds to a subsequence z u consisted of m barcode patterns:1 2 ... z m u     where inf i    .In turn, the alphabet inf  corresponds with a numeric set {0, 1, …, inf 1 P   }, as barcode patterns of the barcode symbolism, which are used for representing textual data, can be numbered from 0 to inf P  .The above-mentioned transformation is considered as i z w u are quantitative equivalents of, correspondingly, maximal n-digits number in a notation P A and maximal m-digits number in a notation inf P  .  3 log A n P is a length of the ternary sequence, which corresponds to an alphanumeric sequence 1 2 .. .
t  be transformed into long barcode patterns subsequence 1 2 ...z m

Figure 3 -
Figure 3 -Dependence of a compression coefficient on a cardinality P A of the alphabet when s = 7

Figure 4 -
Figure 4 -Interconnection between the compression modes

2 82
log 88 574 = black-and-white cells.If the same sequence (6) is represented as a tricolor BGW image, it would require other words, data density of the barcode symbol increases approximately in 574/410 = 1.4 times.It happens due to transition from two-color to tricolor image.

Table 2 -
Some integer solutions of the system (5) for s = 7 A Type of transformation, n → m A U P