Emerging Paradigms of Network and Pruning Strategies: A comprehensive Survey

doi:10.21203/rs.3.rs-1922909/v1

Download PDF

Research Article

Emerging Paradigms of Network and Pruning Strategies: A comprehensive Survey

https://doi.org/10.21203/rs.3.rs-1922909/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Deep neural networks have been utilized in a variety of applications and have shown to have exceptional skills in the area of computer vision. Complex network designs delivers a considerable computational resource and energy cost issue for real-time deployment. These difficulties can be solved using improvements like network compression. Many times, network compression may be achieved with minimum loss of accuracy. Accuracy may even enhance in rare circumstances. This study presents a pruning survey on network compression. Pruning can be classified as dynamic or static, depending on whether it is done offline or in real time. This article analyses pruning methods and explains the criterion for removing duplicate calculations. Also covered trade-offs in element-by-element, channel-by-channel, shape-by-shape, filter-by-filter, layer-by-layer, and even network-by-network pruning. In this article, the pros and limitations of a variety of existing methodologies are contrasted and analyzed, as well as compressed network accuracy findings for a variety of frameworks and practical advice for compressing networks.

Neural network

pruning

weight

lottery-ticket

bias

deep learning

and compression

Image classification, object identification, speech synthesis, and semantic segmentation are just a few of the complex applications where Deep Neural Networks (DNNs) have proven their worth in them [1]. Recent neural network models with thousands of parameters have showed human-level skills, and at a high computational expense. DNNs with a lot of parameters has take long time to train. In embedded contexts, these huge networks are also difficult to deploy. When transmitting data and weights among Compute Units (CUs), bandwidth and memory, bandwidth becomes a limiting concern [2, 3].

Over-parameterization is a feature of a neural network in which superfluous neurons do not increase outcomes correctness [4]. Typically, this repetition may be reduced with little or no loss of accuracy. Novel elements, network architecture analysis, and knowledge extraction are the three sections of network structure. The construction of efficient blocks like inception blocks, residual blocks, and separable convolution are novel elements [5]. The sorts of connections inside layers are also included in network elements. Deep neural networks with N2 connections among neurons are fully linked. Merely connections in the forward path are considered in feed forward layers, which reduces the number of connections. This brings the total count of interconnections down to N [6, 7].

Dropout layers and other sorts of components can further minimise the count of interconnections. Network Architecture Search (NAS), often known as network auto searching, which is a software that examines a wide specified search area for a high efficiency network topology [8]. Every developed construction is subjected to an optimizer. While the finished architecture takes longer to calculate, this typically outperforms the manually constructed networks. From information transmission, a common issue called Knowledge Distillation (KD) arose. The objective is to create a compressed model that performs similarly to a bigger model [9]. KD develops a student network that attempts to mimic a teacher network [45–54]. The student network is often shallower and smaller than the teacher's, however this is not always the case. The instructor version should be less technologically complicated than the trained student model [10, 11].

Network optimization includes: 1) computational convolution optimization, 2) parameter factorization, 3) network pruning, and 4) network quantization. Convolution operations are more efficient than fully connected computations because they keep high dimensional information as a 3D tensor rather than flattening the tensors into vectors that removes the original spatial information. This feature helps CNNs to fit the underlying structure of image data in particular. Convolution layers also require significantly less coefficients compared to Fully Connected Layers (FCLs). Computational convolution optimizations include Fast Fourier Transform (FFT) based convolution, Winograd convolution, and the popular image to column (im2col) approach [12].

Parameter factorization is a technique that decomposes higher-rank tensors into lower-rank tensors simplifying memory access and compressing model size. It works by breaking large layers into many smaller ones, thereby reducing the number of computations. It can be applied to both convolutional and fully connected layers. This technique can also be applied with pruning and quantization [13, 55–60, 62]. Network pruning involves removing parameters that don’t impact network accuracy [14. 61, 63].

Network pruning is a useful approach for reducing memory space and bandwidth. Pruning approaches were created in the early 1990s to decrease a big network that has been trained into a tiny network without necessitating retraining [15]. This made it possible to use neural networks in limited contexts, such as embedded devices and in electronic components. Pruning eliminates unnecessary neurons or parameters that have no bearing on the correctness of the output [16]. When the weight factors are zero, near to zero, or repeated, this circumstance can occur in the network. Pruning decreases the computation cost as a result. If pruned networks are retrained, they may be able to escape a prior local minimum and enhance accuracy even more. The two types of network pruning research are sensitivity computation and penalty-term approaches [17, 18, 64–70].

Recent research has shown advances in both network pruning classes and a combination of them. New network pruning strategies have developed in recent years. Various characteristics of contemporary pruning procedures may be categorised that includes:

Structured and unstructured pruning dependent, if the pruned network is symmetric or not,
Neuron and associative pruning dependent on the pruned component type, or
Dynamic and static pruning.

All pruning stages are conducted offline prior to any inference in static pruning, whereas dynamic pruning is done in real time environment. While there is some duplication among the classifications, we shall classify network pruning strategies using dynamic and static pruning in this research. Element-by-element, row-by-row, column-by-column, filter-by-filter, and layer-by-layer pruning are all some forms of pruning. Generally, element-by-element partitioning has the least influence on the model's design, resulting in an unstructured design [70–76].

$$\text{arg}\underset{p}{\text{min}}L=NT\left(x;W\right)-{NT}_{p}\left(x;{W}_{PR}\right) where {NT}_{p}\left(x;{W}_{PR}\right)=P\left(NT\left(x;W\right)\right)$$

Pruning can be mathematically expressed as in the equation above, which is regardless of category. The full neural network using an as input is represented by NT, which consists of a sequence of layers (e.g., convolutional layer, pooling layer, etc.). In comparison to the unpruned network, PN symbolises the pruned network with an NT_p loss in performance. The accuracy of categorization is a common measure of network performance. The PR(.)(pruning function) produces a new network configuration NT_p as well as pruned weights WPR. The impact of PR(.) on NTp is the focus of the following section and acquiring of W_PR is also considered [77–81].

Static Pruning

After training and before inference, static pruning is a network optimization approach that eliminates neurons from the network. There is no further network trimming done during inference. Static pruning usually consists of three steps: 1) choosing which parameters to prune, 2) deciding how to prune the neurons, and 3) fine-tuning or retraining if necessary. Retraining the pruned network to attain equal accuracy to the unpruned network may increase performance, but it may take substantial offline computing time and energy [19].

Dynamic Pruning

Dynamic pruning decides which layers, channels, or neurons will not be active in the future at runtime. By taking use of altering input data, dynamic pruning can overcome the limits of static pruning, possibly decreasing computation, bandwidth, and power consumption. In most cases, dynamic pruning does not undertake runtime fine-tuning or re-training. The decision method that determines what to prune is the most critical consideration [20].

Table 1

Comprehensive Analysis of Network Pruning
Approach	Pruning Strategy	Inference
Static Pruning	Magnitude based Pruning	It has been postulated and largely recognised that trained weights with big values are more essential than those with lower values. The cornerstone to magnitude-based approaches is this finding. Magnitude-based pruning strategies aim to discover and eliminate unnecessary weights or features from runtime assessment. Unused values can be trimmed in both the kernel and the activation map. Pruning all zero-valued weights or all weights within an absolute value threshold is the most intuitive magnitude-based pruning strategy [21].
	Filter-wise Pruning	It employs the l1 -norm to eliminate filters that have no impact on classification accuracy. On the CIFAR-10 dataset, pruning whole filters and their accompanying feature maps lowered inference costs by 34% for VGG-16 and 38% for ResNet-110, with better accuracy of 0.75 percent and 0.02 percent, respectively [22].
	Penalty based Pruning	The purpose of penalty-based pruning is to adjust an error function or add other limitations to the training process, known as bias terms. Some weights are updated to zero or near zero using a penalty value. After that, the values are trimmed [23].
	Element-wise Pruning	Unstructured network organisations may occur from element-by-element pruning. As a result, sparse weight matrices are generated, which are difficult to process on instruction set computers. Furthermore, without specialist hardware assistance, they are frequently difficult to compress or speed. Group LASSO compensates for these inadequacies by employing a systematic pruning strategy that eliminates whole groups of neurons while keeping network topology [24].
	Group-wise brain damage	It also added the LASSO restriction for groups, but only for filters. This creates sparsity and simulates brain damage. On the VGG Network, it achieved a 2 speedup with a 0.7 percent ILSVRC-2012 accuracy loss [25].
	Network Slimming	It uses LASSO on the BN scaling factors. The activation is normalised using statistical parameters collected during the training phase via BN. Slimming the network has the impact of bringing forward-invisible extra parameters without adding overhead. Channel-wise pruning can be enabled by setting the BN scaler parameter to zero. On ILSVRC-2012, they achieved an 82.5 percent size reduction with VGG and a 30.4 percent calculation compression without losing accuracy [26].
	Sparse Structure Selection	It's a slimming approach based on a generalised network. It prunes neurons, groups, and residual blocks by applying LASSO to sparse scaling factors. Using an upgraded gradient approach, Accelerated Proximal Gradient (APG), the suggested method achieves 4 speed-up on VGG-16 with 3.93 percent ILSVRC-2012 top-1 accuracy loss without fine-tuning [27].
Pruning combined with Tuning or Retraining	Deep Compression	It explains how to prune connections that don't help with classification accuracy using a static technique. They eliminate weights with tiny values in addition to feature map trimming. They re-train the network after pruning to increase accuracy. This process is repeated three times, resulting in a decrease of 9 to 13 total parameters with no loss of precision. The majority of the characteristics that were eliminated came from FCLs [28].
	Recoverable Pruning	In most cases, items that have been pruned cannot be retrieved. As a result, network capacity may be lowered. Recovering network capabilities necessitates extensive retraining. To retrain the network for deep compression, it took millions of iterations. Many techniques use recoverable trimming algorithms to circumvent this flaw. The trimmed components may also play a role in the future training phase, adapting to the reduced network [29].
	Soft Filter Pruning	It also used a filter dimension to extend recoverable pruning. SFP was able to achieve structured compression results with the added benefit of a shorter predictable times. Moreover, SFP may be employed on networks that are difficult to compress, with a 29.8% speedup on ResNet-50 and a 1.54 percent ILSVRC-2012 top-1 accuracy loss. In comparison to Guo's recoverable weight approach, SFP uses the structure of the filter to produce inference speedups that are closer to theoretical findings on generic hardware [29].
	Auto pruner	As a distinct training-friendly layer, it incorporates the pruning and fine-tuning of a three-stage pipeline. The layer assisted in the steady pruning of the network during training, resulting in a less complicated network. With 2.39 percent ILSVRC2012 top-1 loss, AutoPruner pruned 73.59 percent of compute operations on VGG-16. ResNet-50 resulted in a 65.80% reduction in computational operations and a 3.10 percent reduction in accuracy [30].
Dynamic Pruning	Conditional Computing	Conditional computing entails turning on a portion of a network that is optimal without turning on the full network. Pruning is the term used to describe the process of removing non-active neurons from the brain. They are not included in the final result, minimising the amount of calculations necessary. Training and inference are both affected by conditional computing [31].
	Reinforcement Learning	Adaptive networks strive to speed up network inference by detecting early exits on a conditional basis. Thresholds can be used to make a trade-off between network computation and accuracy. Multiple intermediate classifiers are used in adaptive networks to allow for an early departure. An adaptive network is a cascade network. Cascade networks are made up of many serial networks with output layers instead of per-layer outputs. Cascade networks offer the benefit of an early departure since all output layers do not need to be calculated. Inference might be transmitted to a cloud device if the early accuracy of a cascade network is insufficient [32].
	Differential Adaptive Networks	Because the majority of the aforementioned choice components are nondifferential, RL is used for training. Using differentiable approaches, a variety of strategies have been developed to minimise training complexity [33].

Pruning methods vary widely and are difficult to compare. In [34], there is a single benchmark system aimed at comparing pruning performance. The value of the pre-trained weights is a point of contention. The pruned model might be trained from scratch using a random weight initialization, according to [35]. This means that the trimmed architecture is critical to success. The pruning algorithms might be viewed as a form of NAS as a result of this discovery. Because weight values may be retrained, the author determined that they are ineffective on their own. However, only when the weight initialization was identical to the unpruned model did the lottery ticket hypothesis [36] attain equal accuracy. The disagreement was overcome in [37] by demonstrating that the pruning form is what truly important. Unstructured pruning, in particular, can only be fine-tuned to restore accuracy, but structured pruning may be taught from the ground up. They also looked into the effectiveness of dropout and l0 regularisation. Simple magnitude-based trimming performed better, according to the findings. They created a magnitude-based pruning technique and demonstrated that the pruned ResNet-50 outperformed SOTA with the same computational complexity.

Pruning may be done in a variety of methods, but elementwise pruning enhances weight compression and storage. With specialised hardware and computational channel, libraries, and shape-wise pruning may be enhanced. Filtering and layering can significantly reduce computing complexity. Though pruning can occasionally enhance accuracy by escaping a local minimum for upgrading to a better network design is a superior way to get more accuracy. A segmented block may give improved accuracy while reducing processing complexity. The growth of network architectures and the architecture itself may be a limitation for performance. Network Architecture Search (NAS) and Knowledge Distillation may be viable choices for additional compression in this case. Network pruning may be thought of as a subset of NAS with a narrower search area. This is particularly true when the pruned design no longer requires the unpruned network's weights. Additionally, learned coefficients and supervised learning search are two NAS approaches that may be used to the pruning strategy [38].

The following approaches are suggested for successful pruning based on the evaluated pruning methods:

Because uniform pruning reduces accuracy, it is preferable to alter the pruning ratio per layer [39].
Dynamic pruning may improve accuracy while preserving network capacity [40].
When pruning a network structurally, ageing libraries can help, especially when pruning at a high level [41].
Tuning from the unpruned weights is sometimes, but not always, more efficient than training a pruned model from beginning [42, 43].
When compared to magnitude-based pruning, penalty-based pruning often minimises accuracy loss. Recent initiatives, however, have helped to close the gap [44, 45].

In the realm of computer vision, deep neural networks have been used in a variety of applications, and demonstrating exceptional capability. Complex network design provides a considerable computational and energy cost hurdle for real-time deployment. Using improvements like network compression, these problems can be solved. With little loss of accuracy, network compression is frequently possible. Accuracy could even improve in some circumstances. Pruning is classified as either static or dynamic, depending on whether it is done offline or in real time. The criteria incorporated in removing redundant calculations and a general magnitude of weights with values near zero being pruned. Checking the L_p-norm is an example of a more advanced approach. L₁ and L₂ norms are at the heart of techniques like LASSO and Ridge. Pruning can be done element by element, channel by channel, shape by shape, filter by filter, layer by layer, and network by network. Compression, precision, and speed are all trade-offs with each.

Lecun, Y., Bengio, Y., Hinton, G., 2015. Deep learning. Nature 521, 436–444. doi:10.1038/nature14539.
Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D., 2020. Language Models are Few-Shot Learners. ArXiv preprint URL: http://arxiv.org/abs/ 2005.14165
Sze, V., Chen, Y.H.H., Yang, T.J.J., Emer, J.S., 2017. Efficient Processing of Deep Neural Networks: A Tutorial and Survey. Proceedings of the IEEE 105, 2295–2329. URL: http://ieeexplore.ieee.org/document/8114708/, doi:10.1109/JPROC.2017.2761740.
Elsken, T., Metzen, J.H., Hutter, F., 2019. Neural Architecture Search. Journal of Machine Learning Research 20, 63– 77. URL: http://link.springer.com/10.1007/978-3-030-05318-5_3, doi:10.1007/978-3-030-05318-5{\_}3.
Gou, J., Yu, B., Maybank, S.J., Tao, D., 2020. Knowledge Distillation: A Survey. ArXiv preprint URL: http://arxiv.org/abs/2006.05525.
Ruffy, F., Chahal, K., 2019. The State of Knowledge Distillation for Classification. ArXiv preprint URL: http://arxiv.org/abs/1912. 10850.
Bucilua, C., Caruana, R., Niculescu-Mizil, A., 2006. Model compres- ˇ sion, in: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’06, ACM Press, New York, New York, USA. p. 535. URL: https://dl.acm.org/doi/abs/10.1145/1150402.1150464, doi:10.1145/1150402.1150464.
Lebedev, V., Lempitsky, V., 2018. Speeding-up convolutional neural networks: A survey. BULLETIN OF THE POLISH ACADEMY OF SCIENCES TECHNICAL SCIENCES 66, 2018. URL: http://www.czasopisma.pan.pl/Content/109869/PDF/ 05_799–810_00925_Bpast.No.66 – 6_31.12.18_K2.pdf?handler = pdfhttp://www.czasopisma.pan.pl/Content/109869/PDF/05_799-810_00925_ Bpast.No.66 – 6_31.12.18_K2.pdf, doi:10.24425/bpas.2018.125927
Mathieu, M., Henaff, M., LeCun, Y., 2013. Fast Training of Convolutional Networks through FFTs. ArXiv preprint URL: http://arxiv.org/abs/1312.5851.
Lavin, A., Gray, S., 2016. Fast Algorithms for Convolutional Neural Networks, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE. pp. 4013– 4021. URL: http://ieeexplore.ieee.org/document/7780804/http:// arxiv.org/abs/1312.5851, doi:10.1109/CVPR.2016.435.
Chellapilla, K., Puri, S., Simard, P., 2006. High Performance Convolutional Neural Networks for Document Processing, in: Tenth International Workshop on Frontiers in Handwriting Recognition. URL: https://hal.inria.fr/inria-00112631/, doi:10.1.1.137.482
Blalock, D., Ortiz, J.J.G., Frankle, J., Guttag, J., 2020. What is the State of Neural Network Pruning? ArXiv preprint URL: http://arxiv.org/abs/2003.03033.
Augasta, M.G., Kathirvalavakumar, T., 2013. Pruning algorithms of neural networks - A comparative study. Open Computer Science 3, 105–115. doi:10.2478/s13537-013-0109-x.
Qin, H., Gong, R., Liu, X., Shen, M., Wei, Z., Yu, F., Song, J., 2020b. Forward and Backward Information Retention for Accurate Binary Neural Networks, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE. pp. 2247–2256. URL: https://ieeexplore.ieee.org/document/9157443/, doi:10.1109/CVPR42600.2020.00232
Reed, R., 1993. Pruning Algorithms - A Survey. IEEE Transactions on Neural Networks 4, 740–747. URL: http://ieeexplore.ieee.org/document/248452/, doi:10.1109/72.248452.
Choi, B., Lee, J.H., Kim, D.H., 2008. Solving local minima problem with large number of hidden nodes on two-layered feedforward artificial neural networks. Neurocomputing 71, 3640–3643. doi:10.1016/j.neucom.2008.04.004
Alemdar, H., Leroy, V., Prost-Boucle, A., Petrot, F., 2017. Ternary neural networks for resource-efficient AI applications, in: 2017 International Joint Conference on Neural Networks (IJCNN), IEEE. pp. 2547–2554. URL: https://ieeexplore.ieee.org/abstract/document/7966166/, doi:10.1109/IJCNN.2017.7966166.
Bianco, S., Cadene, R., Celona, L., Napoletano, P., 2018. Benchmark analysis of representative deep neural network architectures. IEEE Access 6, 64270–64277. doi:10.1109/ACCESS.2018.2877890
Bolukbasi, T., Wang, J., Dekel, O., Saligrama, V., 2017. Adaptive Neural Networks for Efficient Inference. Thirty-fourth International Conference on Machine Learning URL: https://arxiv.org/abs/1702. 07811http://arxiv.org/abs/1702.07811.
Gao, X., Zhao, Y., Dudziak, L., Mullins, R., Xu, C.Z., Dudziak, L., Mullins, R., Xu, C.Z., 2019. Dynamic Channel Pruning: Feature Boosting and Suppression, in: International Conference on Learning Representations (ICLR), pp. 1–14. URL: http://arxiv.org/abs/1810. 05331.
Lei, W., Chen, H., Wu, Y., 2017. Compressing Deep Convolutional Networks Using K-means Based on Weights Distribution, in: Proceedings of the 2nd International Conference on Intelligent Information Processing - IIP’17, ACM Press, New York, New York, USA. pp. 1–6. URL: http://dl.acm.org/citation.cfm?doid=3144789.3144803, doi:10.1145/3144789.3144803.
Li, H., Kadav, A., Durdanovic, I., Samet, H., Graf, H.P., 2017a. Pruning Filters for Efficient ConvNets, in: International Conference on Learning Representations (ICLR). URL: http://arxiv.org/abs/ 1608.08710, doi:10.1029/2009GL038531.
HANSON, S., 1989. Comparing biases for minimal network construction with back-propagation, in: Advances in Neural Information Processing Systems (NIPS), pp. 177–185
Han, S., Liu, X., Mao, H., Pu, J., Pedram, A., Horowitz, M.A., Dally, W.J., 2016a. EIE: Efficient Inference Engine on Compressed Deep Neural Network, in: 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), IEEE. pp. 243–254. URL: http://ieeexplore.ieee.org/document/7551397/http: //arxiv.org/abs/1602.01528, doi:10.1109/ISCA.2016.30
Yuan, M., Lin, Y., 2006. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68, 49– 67. URL: http://doi.wiley.com/10.1111/j.1467-9868.2005.00532.x, doi:10.1111/j.1467-9868.2005.00532.x.
Lebedev, V., Lempitsky, V., 2016. Fast ConvNets Using Group-Wise Brain Damage, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE. pp. 2554–2564. URL: http://openaccess.thecvf.com/content_cvpr_2016/html/Lebedev_ Fast_ConvNets_Using_CVPR_2016_paper.htmlhttp://ieeexplore.ieee.org/document/7780649/, doi:10.1109/CVPR.2016.280.
Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., Zhang, C., 2017. Learning Efficient Convolutional Networks through Network Slimming, in: IEEE International Conference on Computer Vision (ICCV), IEEE. pp. 2755–2763. URL: http://ieeexplore.ieee.org/document/8237560/, doi:10.1109/ICCV.2017.298.
Huang, Z., Wang, N., 2018. Data-Driven Sparse Structure Selection for Deep Neural Networks, in: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). volume 11220 LNCS, pp. 317– 334. URL: http://link.springer.com/10.1007/978-3-030-01270-0_ 19, doi:10.1007/978-3-030-01270-0{\_}19
Han, S., Mao, H., Dally, W.J., 2016b. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding, in: International Conference on Learning Representations(ICLR), pp. 199–203. URL: http://arxiv.org/abs/1510.00149
Luo, J.H., Wu, J., 2020. AutoPruner: An end-to-end trainable filter pruning method for efficient deep model inference. Pattern Recognition 107, 107461. URL: https://linkinghub.elsevier.com/retrieve/pii/S0031320320302648, doi:10.1016/j.patcog.2020.107461.
NVIDIA Corporation, 2018b. NVIDIA Turing GPU Architecture. White Paper URL: https://gpltech.com/wp-content/uploads/2018/ 11/NVIDIA-Turing-Architecture-Whitepaper.pdf.
Bengio, E., Bacon, P.L., Pineau, J., Precup, D., 2015. Conditional Computation in Neural Networks for faster models. ArXiv preprint URL: http://arxiv.org/abs/1511.06297.
Leroux, S., Bohez, S., De Coninck, E., Verbelen, T., Vankeirsbilck, B., Simoens, P., Dhoedt, B., 2017. The cascading neural network: building the Internet of Smart Things. Knowledge and Information Systems 52, 791–814. URL: http://link.springer.com/10.1007/s10115-017-1029-1, doi:10.1007/s10115-017-1029-1.
Migacz, S., 2017. 8-bit inference with TensorRT. GPU Technology Conference 2, 7. URL: https://on-demand.gputechconf.com/gtc/ 2017/presentation/s7310-8-bit-inference-with-tensorrt.pdf.
Blalock, D., Ortiz, J.J.G., Frankle, J., Guttag, J., 2020. What is the State of Neural Network Pruning? ArXiv preprint URL: http://arxiv.org/abs/2003.03033
Liu, Z., Sun, M., Zhou, T., Huang, G., Darrell, T., 2019b. Rethinking the Value of Network Pruning, in: International Conference on Learning Representations (ICLR), pp. 1–11. URL: http://arxiv.org/abs/1810.05270.
Frankle, J., Carbin, M., 2019. The lottery ticket hypothesis: Finding sparse, trainable neural networks, in: International Conference on Learning Representations(ICLR). URL: http://arxiv.org/abs/1803. 03635.
Gale, T., Elsen, E., Hooker, S., 2019. The State of Sparsity in Deep Neural Networks. ArXiv preprint URL: http://arxiv.org/abs/1902. 09574.
Cai, H., Gan, C., Wang, T., Zhang, Z., Han, S., 2019. Once-for-All: Train One Network and Specialize it for Efficient Deployment. ArXiv preprint, 1–15URL: http://arxiv.org/abs/1908.09791.
Liu, Z., Mu, H., Zhang, X., Guo, Z., Yang, X., Cheng, T.K.T., Sun, J., 2019a. MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning, in: IEEE International Conference on Computer Vision. URL: http://arxiv.org/abs/1903.10258.
Wu, Z., Nagarajan, T., Kumar, A., Rennie, S., Davis, L.S., Grauman, K., Feris, R., 2018b. BlockDrop: Dynamic Inference Paths in Residual Networks, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE. pp. 8817–8826. URL: https://ieeexplore.ieee.org/document/8579017/, doi:10.1109/CVPR.2018.00919.
Wen, W., Wu, C., Wang, Y., Chen, Y., Li, H., 2016. Learning Structured Sparsity in Deep Neural Networks, in: Advances in Neural Information Processing Systems (NIPS), IEEE. pp. 2074– 2082. URL: https://dl.acm.org/doi/abs/10.5555/3157096.3157329, doi:10.1016/j.ccr.2008.06.009
Luo, J.H., Wu, J., 2020. AutoPruner: An end-to-end trainable filter pruning method for efficient deep model inference. Pattern Recognition 107, 107461. URL: https://linkinghub.elsevier.com/retrieve/pii/S0031320320302648, doi:10.1016/j.patcog.2020.107461.
Ye, J., Lu, X., Lin, Z., Wang, J.Z., 2018. Rethinking the SmallerNorm-Less-Informative Assumption in Channel Pruning of Convolution Layers. ArXiv preprint URL: http://arxiv.org/abs/1802.00124.
Glossner, J., Blinzer, P., Takala, J., 2016. HSA-enabled DSPs and accelerators. 2015 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2015, 1407–1411doi:10.1109/GlobalSIP. 2015.7418430.
Zhang, W., Li, H., Li, Y., Liu, H., Chen, Y., & Ding, X. (2021). Application of deep learning algorithms in geotechnical engineering: a short critical review. Artificial Intelligence Review, 54(8), 5633–5673.
Bejani, M. M., & Ghatee, M. (2021). A systematic review on overfitting control in shallow and deep neural networks. Artificial Intelligence Review, 54(8), 6391–6438.
Wang, C., Liu, B., Liu, L., Zhu, Y., Hou, J., Liu, P., & Li, X. (2021). A review of deep learning used in the hyperspectral image analysis for agriculture. Artificial Intelligence Review, 54(7), 5205–5253.
Zeng, W., Yuan, J., Yuan, C., Wang, Q., Liu, F., & Wang, Y. (2021). A new approach for the detection of abnormal heart sound signals using TQWT, VMD and neural networks. Artificial Intelligence Review, 54(3), 1613–1647.
Adegun, A., & Viriri, S. (2021). Deep learning techniques for skin lesion analysis and melanoma cancer detection: a survey of state-of-the-art. Artificial Intelligence Review, 54(2), 811–841.
Khan, Z. Y., Niu, Z., Sandiwarno, S., & Prince, R. (2021). Deep learning techniques for rating prediction: a survey of the state-of-the-art. Artificial Intelligence Review, 54(1), 95–135.
Agbo-Ajala, O., & Viriri, S. (2021). Deep learning approach for facial age classification: a survey of the state-of-the-art. Artificial Intelligence Review, 54(1), 179–213.
Ben Yedder, H., Cardoen, B., & Hamarneh, G. (2021). Deep learning for biomedical image reconstruction: A survey. Artificial intelligence review, 54(1), 215–251.
Zhang, G., Liu, B., Zhu, T., Zhou, A., & Zhou, W. (2022). Visual privacy attacks and defenses in deep learning: a survey. Artificial Intelligence Review, 1–55.
Aldahdooh, A., Hamidouche, W., Fezza, S. A., & Déforges, O. (2022). Adversarial example detection for DNN models: A review and experimental comparison. Artificial Intelligence Review, 1–60.
Zohourianshahzadi, Z., & Kalita, J. K. (2021). Neural attention for image captioning: review of outstanding methods. Artificial Intelligence Review, 1–30.
Le, N., Rathour, V. S., Yamazaki, K., Luu, K., & Savvides, M. (2021). Deep reinforcement learning in computer vision: a comprehensive survey. Artificial Intelligence Review, 1–87.
Cebollada, S., Payá, L., Jiang, X., & Reinoso, O. (2022). Development and use of a convolutional neural network for hierarchical appearance-based localization. Artificial Intelligence Review, 55(4), 2847–2874.
Gupta, N., & Jalal, A. S. (2021). Traditional to transfer learning progression on scene text detection and recognition: a survey. Artificial Intelligence Review, 1–46.
Ünal, H. T., & Başçiftçi, F. (2021). Evolutionary design of neural network architectures: a review of three decades of research. Artificial Intelligence Review, 1–80.
Igbe, T., Li, J., Kandwal, A., Omisore, O. M., Yetunde, E., Yuhang, L., … Nie, Z.(2022). An absolute magnitude deviation of HRV for the prediction of prediabetes with combined artificial neural network and regression tree methods. Artificial Intelligence Review, 55(3), 2221–2244.
Neu, D. A., Lahann, J., & Fettke, P. (2021). A systematic literature review on state-of-the-art deep learning methods for process prediction. Artificial Intelligence Review, 1–27.
Akay, B., Karaboga, D., & Akay, R. (2021). A comprehensive survey on optimizing deep learning models by metaheuristics. Artificial Intelligence Review, 1–66.
Gronauer, S., & Diepold, K. (2022). Multi-agent deep reinforcement learning: a survey. Artificial Intelligence Review, 55(2), 895–943.
Singh, B., Kumar, R., & Singh, V. P. (2021). Reinforcement learning in robotic applications: a comprehensive survey. Artificial Intelligence Review, 1–46.
Rivera, M. J., Teruel, M. A., Maté, A., & Trujillo, J. (2021). Diagnosis and prognosis of mental disorders by means of EEG and deep learning: a systematic mapping study. Artificial Intelligence Review, 1–43.
Narkhede, M. V., Bartakke, P. P., & Sutaone, M. S. (2022). A review on weight initialization strategies for neural networks. Artificial intelligence review, 55(1), 291–322.
Wang, G., Jia, Q. S., Zhou, M., Bi, J., Qiao, J., & Abusorrah, A. (2021). Artificial neural networks for water quality soft-sensing in wastewater treatment: a review. Artificial Intelligence Review, 1–23.
Urs, N., Behpour, S., Georgaras, A., & Albert, M. V. (2022). Unsupervised learning in images and audio to produce neural receptive fields: a primer and accessible notebook. Artificial Intelligence Review, 55(1), 111–128.
Khan, A., Sohail, A., Zahoora, U., & Qureshi, A. S. (2020). A survey of the recent architectures of deep convolutional neural networks. Artificial intelligence review, 53(8), 5455–5516.
Yeom, S. K., Seegerer, P., Lapuschkin, S., Binder, A., Wiedemann, S., Müller, K. R., & Samek, W. (2021). Pruning by explaining: A novel criterion for deep neural network pruning. Pattern Recognition, 115, 107899.
Korn, C., & Augustin, H. G. (2015). Mechanisms of vessel pruning and regression. Developmental cell, 34(1), 5–17.
Tharini, V. J., & Shivakumar, B. L. High-utility itemset mining: fundamentals, properties, techniques and research scope. In Computational Intelligence and Data Sciences (pp. 195–210). CRC Press.
Zhang, Q., Zhang, M., Chen, T., Sun, Z., Ma, Y., & Yu, B. (2019). Recent advances in convolutional neural network acceleration. Neurocomputing, 323, 37–51.
Liang, T., Glossner, J., Wang, L., Shi, S., & Zhang, X. (2021). Pruning and quantization for deep neural network acceleration: A survey. Neurocomputing, 461, 370–403.
Jeevika Tharini, V., & Vijayarani, S. (2019, December). Bio-inspired High-Utility Item Framework based Particle Swarm Optimization Tree Algorithms for Mining High Utility Itemset. In International Conference on Advances in Computational Intelligence and Informatics (pp. 265–276). Springer, Singapore.
Khan, M. A. R., Shavkatovich, S. N., Nagpal, B., Kumar, A., Haq, M. A., Tharini, V.J., … Alazzam, M. B. (2022). OPTIMIZING HYBRID METAHEURISTIC ALGORITHM WITH CLUSTER HEAD TO IMPROVE PERFORMANCE METRICS ON THE IOT. Theoretical Computer Science.
Wang, Z., Li, F., Shi, G., Xie, X., & Wang, F. (2020). Network pruning using sparse learning and genetic algorithm. Neurocomputing, 404, 247–256.
Wang, Z., Li, F., Shi, G., Xie, X., & Wang, F. (2020). Network pruning using sparse learning and genetic algorithm. Neurocomputing, 404, 247–256.
Chen, S. T., & Yu, P. S. (2007). Pruning of support vector networks on flood forecasting. Journal of Hydrology, 347(1–2), 67–78.

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Emerging Paradigms of Network and Pruning Strategies: A comprehensive Survey

Status:

Version 1

Abstract

1. Introduction

2. Network Pruning

3. Research Direction for Pruning

4. Conclusion

References

Additional Declarations

Status:

Version 1