hyper-sinh: An Accurate and Reliable Function from Shallow to Deep Learning in TensorFlow and Keras

This paper presents the 'hyper-sinh', a variation of the m-arcsinh activation function suitable for Deep Learning (DL)-based algorithms for supervised learning, such as Convolutional Neural Networks (CNN). hyper-sinh, developed in the open source Python libraries TensorFlow and Keras, is thus described and validated as an accurate and reliable activation function for both shallow and deep neural networks. Improvements in accuracy and reliability in image and text classification tasks on five (N = 5) benchmark data sets available from Keras are discussed. Experimental results demonstrate the overall competitive classification performance of both shallow and deep neural networks, obtained via this novel function. This function is evaluated with respect to gold standard activation functions, demonstrating its overall competitive accuracy and reliability for both image and text classification.


Introduction
Despite recent developments of activation functions for Machine Learning (ML)-based classifiers, such as the m-arcsinh (Parisi, 2020) for shallow Multi-Layer Perceptron (MLP) (Rumelhart et al., 1986), usable, repeatable and reproducible functions for both shallow and deep neural networks, e.g., the Convolutional Neural Network (CNN) (LeCun et al., 1995), have remained very limited and confined to three activation functions regarded as 'gold standard'. These include the Rectified Linear Unit (ReLU), the sigmoid function and its modified version, hyperbolic tangent sigmoid or 'tanh' (Lin and Lin, 2003), which extends its range from [0, +1] to [-1, +1]. The sigmoid and tanh have well-known vanishing gradient issues; thus, the ReLU function was devised to be more scalable for deep neural networks, despite its 'dying ReLU' problem, which has recently been solved by (Parisi et al., 2020a). These have been made freely accessible in the open source Python library named 'Keras' (Chollet et al., 2015) for Deep Learning. The availability of these functions in the public domain has enabled not-for-profit and for-profit organisations to leverage them for several intelligence-based applications, from academic to industrial applications (Chollet, 2017) (Parisi et al., 2020a).
Nevertheless, considering the above-mentioned challenges in the Computer Science and ML communities, such activation functions lack robustness with classification tasks of varying degrees of complexity, e.g., slow or lack of convergence (Vert and Vert, 2006) (Jacot et al., 2018), caused by trapping at local minima (Parisi et al., 2020b). Moreover, amongst the three above-mentioned activation functions, only the ReLU is applicable from shallow to deep neural networks, with its novel quantum variations (QReLU and m-QReLU) found more scalable than its traditional version only recently (Parisi et al., 2020a).
On the other side, in sciences dealing with the study of human behaviour, in the last 20 years, considerable progress has been made towards the prevention of mental health disorders (Sander et al., 2016) (Ebert et al., 2017). Specifically, professionals working in the field of counselling psychology have slightly enhanced their ability of grasping relational is-sues in their subjects via novel ML-based tele-monitoring technologies (Shatte et al., 2019). Nevertheless, these technologies have not yet changed the traditional counselling psychology practice, which is still based on a structured methodology that is adopted to help individuals to become more self-aware, more conscious of their own needs and moods (Pieterse et al., 2013). The main goal counsellors pursue is guiding individuals to get to know themselves at a deeper level and to help them discover and resurface their own resources to better manage their emotions in their daily life. This process first requires a tailored dialogue between the counsellor and the individual and, subsequently, leveraging practical tools to aid the individual in their experience to understand their inner self more deeply (Sutton, 2016). Moreover, there are still limitations within the counselling setting. For instance, individuals, out of fear, may not reveal fundamental aspects of their persona that would help counsellors guide them better in getting to know themselves. Furthermore, in many cases, subjects may express a verbal language opposite to their non-verbal one. Counsellors often hardly understand the dynamic patterns observed in the behaviours of their subjects, thus being unable to provide the required help and support to them.
In counselling, neural network algorithms, both shallow and deep depending on the amount of good-quality data and hardware available, have the potential to support counsellors in image and text classification tasks to understand and guide their subjects by helping them infer subtle dynamic changes in their behaviours. Via a careful and effective observation of images, micro-and macro-body movements, and facial expressions   , it is possible to better interpret and understand the subjects' nonverbal language. Even the emotions underlying the written content from subjects may reveal inner aspects of their persona that are fundamental for counsellors to help resurface to increase the subjects' self-awareness and related capability of 'self-healing' (Rennie, 2001). Therefore, from both theoretical and practical standpoints, there is an increasing need for accurate and reliable open source activation functions, which reach convergence faster, avoiding trapping at local minima, are more stable and can also be used and scale across both shallow and deep neural network algorithms for image and text classification. Entirely written in Python and made freely available in TensorFlow (Abadi et al., 2016) and Keras (Chollet et al., 2015), the proposed hyperbolic function is demonstrated as a competitive function with respect to gold standard functions, which suits both shallow and deep neural networks, thus being accurate and reliable for pattern recognition to aid image and text classification tasks.
Thanks to its liberal license, it has been widely distributed as a part of the free software Python libraries TensorFlow (Abadi et al., 2016) and Keras (Chollet et al., 2015), and it is available for use for both academic research and commercial purposes.

Data sets used from Keras
The following benchmark data sets for image and text classification from Keras (Chollet et al., 2015) were used in the experiments described and discussed in this study: • 'CIFAR-10' data set (Krizhevsky et al., 2009), having 50,000 32x32 colour images for training, and 10,000 images for testing, labelled based on 10 mutually exclusive classes of corresponding objects, including airplanes, automobiles, birds, cats, deers, dogs (e.g., sedans, SUVs, etc.), frogs, horses, ships, trucks (only big trucks); • 'Fashion-MNIST' data set (Xiao et al., 2017), which has 60,000 28x28 grayscale images of 10 classes of fashion (T-shirts/tops, trousers, pullovers, dresses, coats, sandals, shirts, sneakers, bags, ankle boots), with 10,000 images for testing; • 'MNIST' data set (LeCun, 1998), with 60,000 28x28 grayscale images of the 10 handwritten digits, having 10,000 images for testing; • 'Reuters' data set (Apté et al., 1994), which has 11,228 news-wires from Reuters, labelled over 46 classes of topics. Each news-wire is encoded as a list of word indices based on their overall frequency in the data set. '0' (zero) is used to encode any unknown words. Words not seen in the training set but that are present in the test set have been skipped.
• 'IMDB' data set (Maas et al., 2011), with 25,000 pre-processed movies reviews from IMDB, labelled by sentiment (positive or negative). Each review is encoded as a list of word indexes (integers) based on their overall frequency in the data set. '0' (zero) is used to encode any unknown words.

Baseline neural network models and hyperparameters
As the purpose of this study is not to devise the most optimised, best-performing classifier for any of the classification tasks involved in sub-section 2.1, but, instead, to extend the m-arcsinh into a novel accurate and reliable activation function that can scale from shallow to deep neural networks, and evaluate it against the current gold standard functions available in the Python library Keras (Chollet et al., 2015), baseline Fully-Connected Neural Networks (FC-NN) and Convolutional Neural Networks (CNN) models were used with the following hyperparameters for the respective classification tasks in sub-section 2.1. The activation functions in the convolutional layers were made vary for testing purposes across the following: ReLU, sigmoid, tanh and the proposed hyper-sinh.
The CNN-related hyperparameters to classify the CIFAR-10 data set are as follows: • three convolutional layers, each of which has a kernel size of 3x3; • the following convolutional filters for each of the three convolutional layers (in order from the first layer to the third one): 32, 64, 64; • max pooling is applied after the first and the second convolutional layers; • after a flattening layer, two dense layers follow, the first one having 64 neurons and ReLU activation, the second one having 10 neurons as per the number of classes in the CIFAR-10 data set.
Listing 1 provides the snippet of code in Python to use a CNN to classify the CIFAR-10 data set, with different activation functions available in Keras (Chollet et al., 2015), including the novel 'hyper-sinh'.
Listing 1: Python code to use a CNN to classify the CIFAR-10 data set, with different activation functions available in Keras (Chollet et al., 2015), including the proposed 'hyper-sinh'. The FC-NN-related hyperparameters to classify the Fashion-MNIST data set are as follows: • one flattening layer; • one dense layer with 128 neurons, with varying activation based on the testing case scenario (one amongst sigmoid, tanh, ReLU and the proposed hyper-sinh); • a final dense layer having 10 neurons as per the number of classes in the Fashion-MNIST data set.
Listing 2 provides the snippet of code in Python to use a FC-NN to classify the Fashion-MNIST data set, with different activation functions available in Keras (Chollet et al., 2015), including the novel 'hyper-sinh'.
Listing 2: Python code to use a FC-NN to classify the Fashion-MNIST data set, with different activation functions available in Keras (Chollet et al., 2015), including the proposed 'hyper-sinh'. The CNN-related hyperparameters to classify the MNIST data set are as follows: • two convolutional layers, each of which has a kernel size of 3x3; • the following convolutional filters for each of the two convolutional layers respectively (in order from the first layer to the second one): 32, 64; • max pooling is applied after the first and the second convolutional layers; • after a flattening layer, a dropout layer is leveraged with 0.5 (50%) as dropout rate; • a final dense layer with softmax activation, having 10 neurons as per the number of classes in the MNIST data set.
Listing 3 provides the snippet of code in Python to use a CNN to classify the MNIST data set, with different activation functions available in Keras (Chollet et al., 2015), including the novel 'hyper-sinh'.
Listing 3: Python code to use a CNN to classify the MNIST data set, with different activation functions available in Keras (Chollet et al., 2015), including the proposed 'hyper-sinh'. The FC-NN-related hyperparameters to classify the Reuters news-wires data set are as follows: • one dense layer with 512 neurons, with varying activation based on the testing case scenario (one amongst sigmoid, tanh, ReLU and the proposed hyper-sinh); • a dropout layer is leveraged with 0.5 (50%) as dropout rate; • a final dense layer with softmax activation, having 46 neurons as per the number of classes/topics in the Reuters news-wires data set.
Listing 4 provides the snippet of code in Python to use a FC-NN to classify the Reuters news-wires and the IMDB data sets, with different activation functions available in Keras (Chollet et al., 2015), including the novel 'hyper-sinh'.
Listing 4: Python code to use a FC-NN to classify the Reuters news-wires data set, with different activation functions available in Keras (Chollet et al., 2015), including the proposed 'hyper-sinh'.

hyper-sinh: A reliable activation function for both shallow and deep learning
For a function to be generalised as an activation function for both shallow and deep neural networks, such as FC-NN and CNN respectively, it has to be able to 1) avoid common gradient-related issues, such as the vanishing and exploding gradient problems and 2) improve discrimination of input data into target classes via a transfer mechanism of appropriate non-linearity and extended range. Considering the two-fold value of m-arcsinh (Parisi, 2020) as a kernel and activation function concurrently for optimal separating hyperplaneand shallow neural network-based classifiers, it was leveraged as the baseline function to be extended for it to scale to deep neural networks. Thus, although the arcsinh was swapped with its original sinh version, and the square root function was replaced with the basic cubic function, their weights were kept as per the m-arcsinh (Parisi, 2020) equivalent implementation, i.e., whilst 1/3 now multiplies sinh, 1/4 is now multiplying the cubic function.
Thus, the novel function hyper-sinh was devised to be suitable for both shallow and deep neural networks concurrently by leveraging a weighted interaction effect between the hyperbolic nature of the hyperbolic sine function ('sinh') for positive values and the nonlinear characteristic of the cubic function for negative values and 0 (zero), more suitable for deep neural networks, whilst retaining their appropriateness for shallow learning too, thus satisfying both the above-mentioned requirements: The derivative of hyper-sinh for negative values and 0 (zero) can be expressed as: Listing 5 provides the snippet of code in Python that implements the proposed hypersinh function as an activation function and its derivative in TensorFlow (Abadi et al., 2016).
Listing 5: Using the hyper-sinh function as an activation function in Tensor-Flow (Abadi et al., 2016). Listing 6 provides the snippet of code in Python that implements the proposed hypersinh function in Keras (Chollet et al., 2015).
Listing 6: Using the hyper-sinh function as an activation function in Keras (Chollet et al., 2015).

Performance evaluation
The accuracy of the FC-NN and CNN using different activation functions as described in sub-sections 2.2 and 2.3 on the data sets outlined in sub-section 2.1, was evaluated via the 'accuracy score' available in 'scikit-learn' (Pedregosa et al., 2011) from 'sklearn.metrics'.
The reliability of such classifiers was assessed via the weighted average of the precision, recall and F1-score computed via the 'classification report', also available in 'scikit-learn' (Pedregosa et al., 2011) from 'sklearn.metrics'. To understand what classification accuracy and reliability are, and how they can be evaluated, please refer to the following studies: (Parisi et al., 2018a), (Parisi et al., 2018b), (Parisi et al., 2020b), (Parisi and RaviChandran, 2020).

Results
Experimental results support the application of the proposed hyper-sinh activation function for both image and text classification tasks, as being accurate and reliable with the following classification performance: • For shallow neural networks (FC-NN): -The 2 nd highest accuracy on 2 out of 5 data sets evaluated (Tables 4 and 5 on text classification). -The 2 nd highest reliability on 2 out of 4 data sets evaluated (Tables 4 and 5 on text classification).
• For deep neural networks (CNN): -The best classification performance on 1 out of 5 data sets evaluated (Table 3).
-The 2 nd highest classification performance on 1 out of 5 data sets evaluated (Table 1). -The 2 nd highest accuracy on 1 out of 5 data sets evaluated (Tables 1 on image classification). -The 2 nd highest reliability on 1 out of 5 data sets evaluated (Table 1 on image  classification).  Table 2. Results on performance evaluation of baseline (non-optimised) Fully Connected Neural Network (FC-NN) with one hidden layer having 128 neurons in Keras with different activation functions, including the proposed hyper-sinh function. The performance of such classifiers was evaluated on the 'Fashion-MNIST' data set available in Keras.

Classifier Activation function Epochs
Testing accuracy (0-1)  Table 3. Results on performance evaluation of baseline (non-optimised) two-layered Convolutional Neural Network (CNN) in Keras with different activation functions, including the proposed hyper-sinh function. The performance of such classifiers was evaluated on the 'MNIST' data set available in Keras.

Classifier Activation function Epochs
Testing accuracy (0-1)  Table 4. Results on performance evaluation of baseline (non-optimised) Fully Connected Neural Network (FC-NN) with one hidden layer having 512 neurons in Keras with different activation functions, including the proposed m-sinh function. The performance of such classifiers was evaluated on the 'Reuters' data set available in Keras.

Classifier Activation function Epochs
Testing accuracy (0-1)  Table 5. Results on performance evaluation of baseline (non-optimised) Fully Connected Neural Network (FC-NN) with one hidden layer having 512 neurons in Keras with different activation functions, including the proposed m-sinh function. The performance of such classifiers was evaluated on the 'IMDB' data set available in Keras.

Discussion
As demonstrated by the competitive results obtained on the 5 data sets evaluated, especially those in Tables 1 and 3 for the deep neural network CNN and Tables 4 and 5 for the shallow neural network FC-NN, the hyper-sinh is deemed a suitable activation function that scales from shallow to deep neural networks. In fact, its accuracy and reliability was high across both sets of benchmark image-and text-based data sets, as quantified via appropriate metrics in sub-section 2.4, and better than some gold standard functions, e.g., considering Table 1 with the accuracy and the F1-score of the CNN using hyper-sinh being 0.70 and 0.69 respectively on the CIFAR-10 image-based data set, as opposed to that of the same CNN but using sigmoid being 0.10 and 0.02 respectively. Moreover, its accuracy and reliability were comparable to the FC-NN using ReLU (accuracy = 0.80, F1-score = 0.79), with higher reliability than the same FC-NN when leveraging the sigmoid function on the 'Reuters' text-based data set (F1-score = 0.78). The proposed hyper-sinh also led to increased precision on the 'IMDB' text-based data set (precision = 0.87) as opposed to sigmoid and tanh (precision = 0.86), when using the same FC-NN as that leveraged to classify the 'Reuters' data set. Therefore, the hyper-sinh demonstrates that it is possible to extend the m-arcsinh to generalise across both shallow and deep neural networks for image and text classification tasks, and that the mathematical formulation of this extended function does not have to be complex at all. As an accurate and reliable activation function, the hyper-sinh is thus deemed a new gold standard activation function for both shallow and deep neural networks, freely available in TensorFlow and Keras.

Conclusion
hyper-sinh was proven an accurate and robust activation function for shallow and deep neural networks for image and text classification, thus being a new gold standard that scales well for FC-NN and CNN. Since it is made freely available, open source, on the Python, TensorFlow and Keras ecosystems, it adds to the selection of activation functions that both not-for-profit and for-profit organisations can have when tackling image and text classification tasks with data sets of various sizes. Importantly, the proposed algorithm, being accurate and reliable, and written in a high-level programming language (Python), can be leveraged as a part of ML-based pipelines in specific use cases, wherein high accuracy and reliability need to be achieved, such as in the healthcare sector (e.g., in counselling psychology), from small to large clinics with its suitability from shallow to deep neural networks. Future work involves further improving this function to reduce its computational cost.