Image texture model based on energy features

In information processing and control systems based on computer vision technologies, it is necessary to provide an effective indicative representation of images for their subsequent analysis. One of the widely used approaches is based on the construction of texture models. In this paper, the description of the texture using energy features is considered. The proposed model is a set of weights of image pixels reflecting their significance from the point of view of image perception. The significance of the pixel is estimated using the energy of the coefficients of the orthogonal discrete multiresolution wavelet transform. The paper presents expressions for calculating pixel weights and shows that the resulting texture models can be used to classify images.

Image features are represented as a vector of numerical values. Its formation is carried out in two stages. At the first stage, image segmentation is performed. As a result of this operation, a set of pixels (image segments) that meet a given criterion are determined. At the second stage, feature values are calculated that reflect the properties of image segments.
The set of feature values is a certain image model that reflects its features from the point of view of specific tasks of information extraction and analysis. The effectiveness of the subsequent analysis and decision-making based on its results largely depends on the quality of this model. Therefore, the development of effective feature models is relevant and practically significant.

Texture analysis of images
One of the widely used approaches to describing images is texture analysis of images. Within the framework of this approach, transformations are carried out that make it possible to represent an image in the form of a texture model as a set of texture features, some of which were listed above.
According to reviews [15,16], the following texture description methods can be distinguished: 1) geometric (structural) methods; 2) statistical methods; 3) model-based methods; 4) signal processing methods (transformation-based methods). Geometric (structural) approaches represent texture with well-defined primitives (microtexture) and a hierarchy of spatial arrangements (macrotexture) of these primitives. To describe a texture, you need to define primitives and placement rules. The choice of a primitive (from a set of primitives) and the likelihood that the selected primitive will be placed at a specific location can be a function of the location or primitives near that location. The advantage of the structured approach is that it provides a good symbolic description of the image. However, this feature is more useful for synthesis than for image analysis. Abstract descriptions can be poorly defined for natural textures due to the variability of both micro-and macrostructure and the lack of a clear distinction between them.
Unlike geometric (structural) methods, statistical approaches do not attempt to explicitly describe the hierarchical structure of a texture. Instead, they represent texture indirectly using non-deterministic properties defined by distributions and relationships between pixels. High quality texture recognition allows obtaining methods based on second-order statistics.
Model-based texture analysis methods are based on the construction of an image model, which can be used not only to describe the texture, but also to synthesize it. The model parameters reflect the essential properties of the texture. This approach uses generative and stochastic models for texture interpretation. The model parameters are estimated and then used to analyze the images.
According to the monograph [19], approaches to texture representation can be grouped into three main classes -statistical, syntactic, and hybrid. Statistical methods are suitable if the dimensions of the primitives are comparable to the pixel dimensions. Syntactic and hybrid methods are more suitable for textures, where primitives can be assigned a label-type primitive, which means that primitives can be described using a greater variety of properties than just tonal properties (for example, a shape description can be used).
Methods based on edge analysis and multiscale representation are of particular importance in our work. Edge analysis is based on appropriate detectors. To describe a texture, for example, the gradient function can be used g(d): where d is the value of the distance between pixels horizontally or vertically; f is the image; i, j are the coordinates of the image pixel. The function g(d) is similar to the negative autocorrelation functionits minimum (maximum) corresponds to the maximum (minimum) of the autocorrelation function. The dimension of the feature space of the texture description is given by the number of distance values d used to calculate the gradient.
Texture properties can also be derived from first and second order edge distribution statistics: 1) graininess -the thinner the texture, the more edges are present in the image of the edge of the texture; 2) contrast -high-contrast textures are characterized by high values of the modulus of the gradient; 3) randomness -random scatter of pixels are characterized by the entropy of the histogram of the edge value; 4) directivity -an approximate measure of directivity can be defined as the entropy of the edge direction histogram; directional textures have an even number of significant histogram peaks, non-directional textures have a uniform edge direction histogram; 5) linearity -the linearity of the texture is determined by the simultaneous appearance of pairs of edges with the same direction of edges at constant distances, when some edges are located in the direction of other edges; 6) periodicity -the periodicity of the texture can be determined based on the analysis of the simultaneous appearance of pairs of edges of the same direction at constant distances in directions perpendicular to the direction of other edges; 7) size -the size of the texture can be based on the joint appearance of pairs of edges with opposite directions of the edges at a constant distance in a direction perpendicular to the directions of other edges. Another feature of texture based on edge analysis is edge density (number of edges per unit area): where p is the pixel of the image area; Mag(p) is edge modulus in pixel p; N is the number of pixels in the image area; T is the threshold value. This characteristic can be extended to take into account both the filling of the texture and the properties of its orientation. For this, normalized histograms of the modulus H mag (R) and the direction H dir (R) of the gradient over the area R are constructed (the histograms are normalized by dividing the frequencies by the size of the area). These histograms usually contain a small fixed number of digits (about 10). A pair of histograms gives a quantitative description of the texture of the area R: ( Description of the texture based on the multiscale representation is based on the application of the wavelet transform. In general, the wavelet transform of a function is expressed as follows: where Wf is the conversion result; f is the original function; ψ* is complex conjugation of the shifted and scaled function ψ, which has a zero mean value, center at the zero point and unit norm; D is the dimension of the signal; u is D-dimensional vector of shift parameters; s is the scale parameter [24]. The wavelet transform decomposes the signal into basic functions of the form: The wavelet transform in the form (4) is continuous. When using discrete sets of shift and scale parameters, we have a discrete wavelet transform. A special case of a discrete wavelet transform is a multiresolution wavelet transform.
Wavelet analysis procedures are based on the wavelet transform. Its advantages include the ability to identify local spatial and frequency characteristic features of images [25]. In addition, fast algorithms exist for orthogonal discrete multiresolution wavelet transforms.
Wavelet transform allows you to get a multiscale description of the texture of images. The most commonly used wavelet transform-based texture features are wavelet energy signatures and their second-order statistics.
As a result of the wavelet transform, the image is represented in the form: where N is the number of frequency ranges; I is the number of levels of decomposition; a L (m, n) are approximating coefficients; d L (m, n), …, d 1 (m, n) are detailing coefficients; m, n are coefficient coordinates. For the image we have: Consequently, the texture can be described by a set N of probability density functions of the first order p(y i ) for i = 1, …, N. A more compact representation can be obtained using a set of variance features: where v i is the variance for the ith frequency range (ith channel); i = 1, ..., N; Var is the operator for calculating variance. The variances of the vi channel can be estimated from the mean sum of squares over the region of interest R of the analyzed texture: , ( where N R is the number of pixels in the region R. To obtain a better estimate of the variance of the low frequency range, it is usually recommended to subtract the square of the average image brightness from the obtained variance value. Next, we will consider the approach developed by the authors to constructing a texture model of images based on energy features calculated using the orthogonal discrete multiresolution Haar wavelet transform. This approach combines the use of variance features, typical for multiscale representation of image textures, with edge analysis.

Energy features
The energy features of an image are estimates of the energy values of pixel attributes. The set of energy features forms a weight model of the image -a set of weights that reflect the significance of the image pixels from the point of view of its perception [26].
In the statistical approach, the texture is defined using non-deterministic properties related to the distributions and ratios of brightness values (gray levels) in the image. You can evaluate the texture from this point of view by analyzing changes in brightness values.
Changes in the brightness values in the image characterize its local features. According to research in the field of biological vision, such features are most important for the perception of scenes in images. In turn, the relatively large amount of brightness change in a pixel characterizes it as a pixel belonging to the edge. The density of edges in different areas of the image reflects the texture of the image as a whole. Thus, by evaluating the significance of a pixel in terms of image perception, we can build a texture model.
In this paper, to assess the significance of pixels as their attributes, it is proposed to use the coefficients of the wavelet transform compared with them. The simplest and fastest is the orthogonal multiresolution Haar transform. When using it, it is assumed that the original image itself is the result of a wavelet decomposition at the last level J and is a matrix of approximating coefficients LL J , whose values are equal to the brightness values of its pixels (it is assumed that the matrices of detailing coefficients of this level LH J , HL J and HH J contain only zero values). To calculate the coefficients of the LL j-1 , LH j-1 , HL j-1 and HH j-1 matrices of level j -1 based on the values of the approximating coefficients of the LL j matrix of level j (j = J, ..., j 0 , where j 0 is the initial level), the following expressions are used: Figures 1 and 2 show the original standard images and the results of their three-level orthogonal multiresolution Haar wavelet transform (shown from top to bottom and from left to right: boat, cameraman, house, jetplane, lake, livingroom, mandril, peppers, pirate). The original images were obtained from the sites [27,28]. They are halftone images with dimensions of 512  512 pixels. Figure 2 shows copies of the original images, which are matrices of approximating coefficients of level j 0 , and matrices of scaled absolute values of detailing coefficients. At the same time, coefficients with larger absolute values are marked with darker pixels.
When considering figure 2, it is necessary to take into account the layout of the matrices of the detailing coefficients of the multiresolution wavelet transform. In this case, the scheme shown in figure 3 is used.
The advantage of the orthogonal multiresolution Haar wavelet transform for image processing is the ability to localize the points of brightness change. Indeed, when comparing figures 1 and 2, it is clear that significant changes in brightness correspond to significant detailing coefficients in absolute value. At the same time, there is also a correspondence between the values of the detailing coefficients of different scales (levels).
There are various ways to calculate the weights for estimating the significance of pixels based on the values of the coefficients of the orthogonal multiresolution Haar wavelet transform, one of which is defined as follows: where Expressions (14) and (15) (16) and then normalized: where max{w J (m J , n J } is the maximum weight value.  The weight models of the original standard images in figure 1 are shown in figure 4, where when the weight model is visualized, the more significant weight values are shown by darker pixels. It should be noted that weight models characterize the contribution of each pixel to the perception of an image in terms of the significance of changes in brightness in it.
The considered source images do not have a regular texture, i.e. they are weakly textured images. In contrast, the images shown in figure 5 are characterized by a more pronounced texture. They are reduced to the size of 512  512 pixels texture images D1 -D9 from the well-known Brodatz database [29]. Figures 6 and 7 show the results of the orthogonal multiresolution Haar wavelet transform and examples of weight models of these images, respectively.

Texture model
Based on the proposed energy features, you can build a texture model of the image using the following steps: 1. Split the image into rectangular regions R 1 , …, R N . 2. For each region R i , i = 1, …, N, calculate the average weight of the pixels entering this region: where | R i | is the number of pixels included in the region R i . 3. Generate feature vector:      Table 1 shows the results of comparing texture descriptions of standard images boat, cameraman, house, jetplane, lake, livingroom, mandril, peppers, and pirate (in the table, images are designated as I 1 -I 9 , respectively). When calculating the weights, the values of the scale factors k 1,j of all levels j from j 0 to J -1 are 0.25, and the values of the scale factors k 2,j of all levels j from j 0 to J -1 are equal to 1. The initial level j 0 , the final level J and the number of levels wavelet expansions L are 6, 9, and 3, respectively. The number of texture features of images is 64 (images are evenly divided into 64 regions). To compare the images, the mean square error criterion was used, calculated from the feature vectors. Table 2 shows the results of a similar comparison of descriptions of texture images D1 -D9 obtained using the same parameters as for standard images.  The presented method of forming texture features has a relatively high computational speed. Experiments have shown this. For the experiments, we used a software implementation developed in the programming language C ++ in the programming system Microsoft Visual Studio 2017 using the computer vision library OpenCV 3.4.9. As a result of the experiments, an average processing time of 0.007 sec was obtained for grayscale images with dimensions of 512  512 pixels. The experiments were carried out on a personal computer with an quad-core processor Intel (R) Core (TM) i5-8300H CPU@2.30 GHz and 8 GB RAM under the operating system Microsoft Windows 10. The advantage of the method is also the possibility of its parallel implementation. This is due to the peculiarities of the multiresolution wavelet transform, which defines independent sets of interrelated coefficients of different levels. The method can be used for image analysis when solving various problems in systems based on computer vision technologies, including for contextual image search, object detection and recognition in images, image compression, etc.

Conclusion
The proposed method for constructing a texture model of images based on energy features is simple to implement and allows a high processing speed to be obtained. The results obtained show that the described features make it possible to ensure stable discrimination of images with different textural characteristics. This can be used to construct effective classifiers for solving problems in systems based on computer vision technologies. For example, the method can be used for image compression or contextual image search. It should be noted that the practical application of the considered approach is associated with the need to take into account the conditions for the functioning of the corresponding systems. For example, with the need to take into account the effect of interference of various nature in control systems and information processing, including interference from electrostatic discharges [30][31][32].