GDC block: gradient-guided direction-aware convolution block for image classification

Convolutional Neural Network (CNN) has achieved great success in visual applications. In the field of image classification, researchers usually customize CNN models to meet the needs of different real-world applications. It consumes a lot of human labor and computing resources but only achieves slight performance improvement. Besides, some recent works try to integrate prior knowledge into classic CNN models to improve their accuracy, but they are usually only effective for special applications. In this paper, we propose a gradient-guided direction-aware convolution (GDC) block. It can be used to replace low-level convolutions of existing CNN without changing the off-the-shelf architecture. The gradient priors provide object shapes that CNN’s low-level convolution requires. And the direction-aware mechanism expands the receptive field size. This scheme is a trade-off between model size and model accuracy. Experimental results show it can moderately reduce the size of any CNN models while enhancing their performance.


Introduction
CNN has achieved great success in visual applications. Especially in recent years, the performance of image classification has been significantly improved, and some classic models have become the mainstream architecture (e.g., VGG [1] and ResNet [2]). In order to deploy CNN in real-world applications, researchers generally design new architectures to meet different requirements. This consumes a lot of computing resources and human labor, on the contrary, only provides a little effect on performance improvement. Besides, these well-designed models generally improve accuracy through more trainable parameters and complex connections. However, due to the limited computing resources of terminal devices, CNN must provide high accuracy within a certain computing budget. Complex models are often difficult to deploy. Therefore, without the need to customize the new architecture at a high price, we consider that it makes sense to improve the efficiency of existing classic CNN models.

Efficient models
Recently, many efficient models have been proposed. SqueezeNet [3] reduces parameters and computation significantly. MobileNet [4] builds lightweight deep neural networks based on a streamlined architecture that uses depthwise separable convolution. They focus on modifying the model structure to reduce the model parameters, thus improve model efficiency. However, this usually comes at the cost of reduced accuracy. Alternatively, some methods work on training small networks. Distillation [5] uses a larger network to teach a smaller network. Neural Architecture Search (NAS) [6]  2 searches model structure automatically. However, these methods require a large number of computing resources to obtain a small model (generally, they consumes thousands of GPU hours), which is often unacceptable in many application scenarios.
Alternatively, some researches focus on how to effectively integrate priors into CNN to improve its performance. A hybrid model that couples discrete wavelet transforms (WT) and artificial neural networks (ANN) is proposed for forecasting water temperature in [7]. Besides, a graph wavelet neural network (GWNN) is presented in [8], which leverages graph wavelet transform to address the shortcomings of previous spectral graph CNN methods that depend on graph Fourier transform. However, due to the lack of interpretability of CNN, these models are usually only effective for specific applications.

GDC block
In this paper, we propose the GDC block to improve the model efficiency without changing off-theshelf architectures. Our GDC block can be easily used to replace the low-level convolution of existing networks. By effectively improving the low-level feature extraction efficiency, it can moderately reduce the size of any CNN models, while enhancing their performance. In the following introduction, we present the GDC block and describe its role in existing CNN models. The off-the-shelf network that employs the GDC block is called GDCNet. For convenience, we take the top left corner of the image as the origin and establish a rectangular coordinate system with the horizontal direction as the X-axis and the vertical direction as the Y-axis. Figure 1 gives the detail of the GDC block. In our GDC block, an original convolution is split into two branches. One branch keeps the original filters to extract basic features while the other branch explores object shapes with a direction-aware mechanism. It utilizes gradient priors to explore object shapes, and with the consists of X-axis and Y-axis one-dimensional filters, it can further achieve direction-aware convolution. Specifically, we first calculate the gradient x G and y G of the input feature, then we use X-axis filters to aware horizontal direction information on x G , and adopt Y-axis filters to aware vertical direction information on y G . Taking 3×3 convolutions as an example, the GDC block contains three parallel layers with kernel sizes of 3×3, 1×3, and 3×1. The 3×3 output and the summarized-output of 1×3 and 3×1 are concatenated as the output of the GDC block. Without changing a given architecture, we simply replace the low-level convolutional layer with a GDC block to construct GDCNet.

Why GDC block works
The effectiveness of our GDC block can be summarized as the following three aspects.
First of all, gradient is one of the most important low-level image features that can be used to improve classification accuracy. This information can effectively detect edges and further present object shapes. Generally, we use the following formula to calculate the X-axis and Y-axis gradients: Where I denotes input features. The above convolution formula can be realized by simple addition and subtraction operations, which does not occupy network parameters and only consumes very little computing resources. Besides, through visualization of well-trained CNN models, researchers find that CNN's low-level feature maps show promising evidence of extracting low-level image features. Hence, without changing the off-the-shelf architecture, we present the GDC block to replace its low-level convolution.
Second, we find that x G represents the pixel change along the X-axis, and the X-axis convolution on x G may expand the receptive field size along X-axis. And the Y-axis convolution is the same. It can be simply expressed as: Where f presents network filters with trainable paramenters W , x f and y f indicates gradient filters. Generally, enlarging the receptive field size can be considered as a strategy to improve CNN performance. Hence, compared with the original convolution, this direction-aware mechanism can improve the classification accuracy of existing architectures. Third, thanks to the two branch structure, the GDC block not only keeps extracting original basic features but also reduces the number of parameters.

Experimental analysis
We experiment GDC block with several representative benchmark models including Cifar-quick [9], VGG, ResNet, and DenseNet [10] on CIFAR-10, CIFAR-100 [11], and ImageNet [12]. Cifar-quick only has 3 convolution layers that can be seen as a representative of existing small models. VGG and ResNet are the most widely used classical CNN models in all vision tasks. As a result, it is more convincing to use them as benchmark models. We only replace the low-level convolution of the above networks with the GDC block. Since Cifar-quick only has three convolution layers, we consider they are all low-level convolution. And we take the convolution layers before the first downsample operation in VGG, ResNet, and DenseNet as low-level convolution because they can be regarded as extracting low-level features and their feature maps have the same resolution as the input image. As can be observed from Table 1 and Table 2, the performance of all models is consistently lifted by a clear margin, suggesting that the benefits of the GDC block can be combined with various architectures. Obviously, with the GDC block, the efficiency of the low-level features improves. Besides, the results clearly show Cifar-quick gets a significant accuracy improvement. It confirms that with the GDC block, the existing small models can perform more efficiently on end devices. To further verify the efficiency of our GDC block, we experiment GDCNet and the normally trained deep models on ImageNet. As shown in Table 3, GDCNet still achieves better performance. This stresses the GDC block can improve the performance of existing CNN models.  Table 4 presents the model parameters of GDCNet and baselines. It can be clearly seen that GDCNet consumes fewer model parameters than its baseline architecture. For a small model, all its convolutions are replaced by GDC block, thus the parameters reduce a lot. For deep models, only some low-level convolutions are replaced by GDC block, thus the parameters reduce slightly. However, while maintaining model architecture and improving performance, it can also reduce some parameters, we consider that the GDC block has made significant progress.

Conclusion
We propose a novel GDC block to explore the effectiveness of integrating gradient priors into CNN models. Through the direction-aware mechanism, it can improve the performance of several benchmark models. Given an off-the-shelf network, the GDC block is a trade-off between model size and model accuracy. Since there is no need to redesign the model architecture, it is meaningful for real-world applications.