Hybrid Architecture based on RNN-SVM for Multilingual Handwriting Recognition using Beta-elliptic and CNN models

Currently, deep learning approaches have proven successful in the areas of handwriting recognition. Despite this, research in this field is still needed, especially in the context of multilingual online handwriting recognition scripts by adopting new network architectures and combining relevant parametric models. In this paper, we propose a multi-stage deep learning-based algorithm for multilingual online handwriting recognition based on hybrid deep Bidirectional Long Short Term Memory (DBLSTM) and SVM networks. The main contributions of our work lie in partly in the composition of a new multi-stage architecture of deep learning networks associated with effective feature vectors that integrate dynamic and visual characteristics. First, the proposed system proceeds by pretreating the acquired script and delimiting its Segments of Online Handwriting Trajectories (SOHTs). Second, two types of feature vectors combining Beta-Elliptic Model (BEM) and Convolutional Neural Network (CNN) are extracted for each SOHT in order to fuzzy classify them into k sub-groups using DBLSTM neural networks for both online and offline branches trained using an unsupervised fuzzy k-means algorithm. Finally, we combine the trained models to strengthen the discrimination power of the global system using SVM engine. Extensive experiments on three data sets were conducted to validate the performance of the proposed method. The experimental results show the effectiveness and complementarities of the individual modules and the advantage of their fusion.


Introduction
Automatic handwriting recognition has regained more importance with the diffusion of intelligent mobile devices such as PDAs, tablet-PC, and smartphones equipped with sensitive pen or touch screens which allow us to easily record online handwriting input in so-called digital ink format. This format describes temporal and spatial information on the sequence of points Mi (xi, yi, ti) sampling the pen or touch trajectory [6]. The hike popularity of these hand-held devices and their adopted natural way of input machine, coupled with recent major advances in machine learning in particular with the efficient insertion of the deep learning techniques in speech recognition, has invited researchers to adopt and develop deep learning approaches to build more efficient online handwriting recognition systems.
The writing style variability coupled with the incorporation of multi-langue words in the same online handwriting script pose some challenges for its recognition task. In fact, besides shape variation and opposed direction of script evolution such as Arabic and Latin, there are much more characters' category as their positions in the word especially, for Arabic script. This makes it difficult to directly adapt existing recognition systems for such tasks and invites to adopt new architecture based on deep learning and relevant complementary models' combination.
One of the most relevant criteria to succeed online recognition system is the choice of the best handwriting modeling. A wide number of designed handcrafted features such as structural, parametric, and global features are described in the literature with an eye to overcome specific problems like occlusions, ligatures and scale variations, etc. Moreover, many other properties are reported with human handwriting movements such as speed amplitude and script size which can be modified involuntarily without alternating the global shape of velocity profile modeling the handwritten trajectory [16]. The hybrid kinematics and geometry aspect of online handwriting generation movements are also presented by the Beta-elliptic model (BEM) [9] which undergoes immense success in many fields such as handwriting regeneration [12], writer identification [17], [18] and temporal order recovery [38]. In fact, the handwritten trajectory is segmented into n simple movements limited by curvature or velocity extremums called strokes. In our work, we are interested to exploit the great benefit of this model in online handwriting trajectory modeling.
Lately, deep learning technologies [24] have attracted immense research works and industry attention. These new categories of methods provide an alternative handwritten representation for end-to-end solution. For instance, CNN is one of the most prominent deep learning methods which allows to automatically extract useful features from low to high level. It has been widely used in pattern recognition field and achieving excellent results. Also, Recurrent neural networks (RNNs) have been employed successfully in online handwriting recognition topics. It seems very effective with LSTM for handwriting sequence generation [25] and produces top results in online handwriting recognition [27]. Motivated by these ideas and seeing the great diversity of writing styles, we are interested in applying a CNN-based model and RNN in our context. In this paper, we present a multi-stage system that is used to recognize online handwritten multi-scripts based on DBLSTM and SVM. The main idea is to combine two different approaches for handwritten trajectory representation: the enhanced BEM which is characterized by a description of dynamic and geometric aspect of online data modeling and the CNN as a powerful feature extraction model applied after transforming the online trajectory into image-like representations. We are interested in overcoming the limits of the online module by allowing the recognition system to apprehend the different styles of writing that differ in the timeline of the ink plot but are similar in the final layout. The feature set we employed is described in section 6. Indeed, we compare the performance of single classifiers and their combination with the aims of enhancing the global system discriminating using four available datasets.
The remainder of this paper is organized as follows: In section 2, we present an outline of the most prominent related works. Section 3 gives a brief description of the supported language. An overview of our proposed recognition process is described in section 4. Section 5 summarizes the pre-processing techniques that are used in our work. We describe in section 6 the proposed strategy for handwritten trajectory modeling. Recognition process that includes SOHT fuzzy classification, regularization and data augmentation, and scripts recognition are presented in section 7. Finally, section 8 describes the evaluation of the proposed system using three databases and discusses the obtained results before ending with a conclusion.

Related work
Many studies have been investigated in online handwriting recognition topic [6], [16]. Besides, it still remains an active area of research because practicable applications are relatively recent, and its technology has been applied in many fields such as artificial intelligence, computer engineering, image analysis, etc. In the literature, studies dealing with online handwriting recognition can be classified into two main categories: the conventional or traditional approaches and deep learning approaches:

The conventional approaches
Generally, conventional approaches require an enormous effort of engineer's expertise to design a representative features extractor. Various studies have been investigated in this context such as decomposition of the online signal into elementary stroke based on Beta-Elliptic model, characteristic strokes [10], or grapheme unit based on baseline detection algorithm like in [20]. Tangent differences and histograms of tangents are used as features to represent online Arabic character [32]. A framework of online Arabic character recognition based on statistical features without considering the variability and change of writing style of characters are investigated in [4]. Within the context, some studies implement algorithms of the conventional pattern classification which include decision trees [5], template matching [19], hidden Markov modeling (HMM) [1], neural networks [29], and support vector machines (SVM) [44]. More recently, Zitouni et al. [60] proposed a two-stage SVM classifier for online Arabic script recognition based on a combination of beta-elliptic and fuzzy perceptual code representation. Also, a reinforcement learning based-approach for online handwriting recognition is proposed by [55]. It consists of extracting structural features using freeman codes and visual codes, and parametric features employing beta stroke theory after segmenting the handwriting trajectory into strokes.

Deep learning-based approaches
Deep learning models extract automatically features from the raw sequential data. Recently, the performance of online handwriting recognition systems has been greatly improved by using these models such as deep belief network (DBN) (e.g., [46]), convolutional neural networks (e.g., [58], [2]) which have been broadly applied for handwriting recognition. It considers as a powerful tool for image classification such as offline characters which represented as scanned images. Some research work uses CNNs to online characters after converting the handwriting trajectory to image-like representations. Also, Yuan et al. [52] proposed word-level recognition for Latin script by employing CNN architecture. In this article, CNN is used for online handwritten document recognition after online word segmentation. Likewise, CNNs have been widely applied for handwriting Chinese text recognition in [59]. It is considered a powerful feature extraction model that generates automatically multiple low and high-features.
Furthermore, RNN have been used successfully for online handwriting recognition due to their ability to model sequential data. It gives good results with Long Short-term Memory (LSTM) for online Arabic character recognition based on rough path signature [49]. Likewise, Ghosh et al. [23] perform cursive and non-cursive word recognition for both Devanagari and Bengali scripts based on LSTM and BLSTM recurrent networks. In this article, each word is divided into three horizontal zones (upper, middle, lower). Then, the middle zone is resegmented into basic strokes in each structural and directional feature are extracted before carrying out training LSTM and BLSTM. To improve the performance of deep BLSTM in online Arabic handwriting recognition, Maalej et al. [31] employed three enhancement techniques such as dropout, Maxout, and ReLU activation function. These techniques are used in a different position on BLSTM layers and give a good result.
Combinations of these technologies have been also investigated. For instance, CNN with deep BLSTM (e.g., [41]) in which CNN is used to generate multiple features automatically from the scanned input handwritten sequence, while BLSTM is responsible for modeling frame dependency within the sequence. RNN with BLSTM (e.g., [51]) and BLSTM with gated recurrent unit (GRU) for online Chinese characters recognition and generation [57]. More recently, a combination of graph CNN and LSTM is introduced also by [52] for EEG emotion recognition.

Language specification
In this section, we briefly give an overview of Arabic and Latin script supported by our system, considering the writing diversity of these scripts compared to others.
The Arabic script is used in multiple languages such as Persian, Kurdish, Urdu and Pashto [41]. More than 420 million people of the word use the Arabic as main language (UNESCO 2012) [47]. Arabic script is generally written cursive, most of these letters are connected to their neighboring. It generally contains 28 basic letters with other 10 marks that can change the meaning of the word. As a comparison, there are 10 digits for usual digit recognition tasks, and there are 26 alphabetic letters for English, while there are over 28 for Arabic.

Table 1. Some of Arabic characters in different forms
The latter are handwritten from right to left, encompass small marks and dots. In addition, the letters have multiple shapes that depend on their position in the word in which they are found. We can distinguish four different forms (Beginning, Middle, Isolated, End form) of the Arabic letters according to their position within a word as shown in Table 1. Although, some characters have the same beginning and isolated shapes (eg., Alif ‫,'ٲ'‬ Raa ‫.)'ر'‬ Also, several letters having the same body (eg. ‫,ث‬ ‫)ب,ت‬ but differ only in the number and position of the dots and diacritical marks which may be accidentally misplaced in handwriting and thus Arabic character recognition is more ambiguous than most other scripts. Further detail of Arabic characteristic and difficulties of writing are presented in [3].
The Latin script serves the most widely adopted writing system in the world. In fact, it is used as the standard set of writing glyphs for 59 languages. Depending on the writing style, Latin script can be written cursive or semicursive and the characters' shapes vary accordingly. In over of the basic 2 * 26 characters (capital and minuscule shapes), it also supports accented characters. Unlike the Arabic script, the Latin script is written from left-to-right.

System overview
In this section, we present an outline of our proposed system which develops a multi-stage architecture based on hybrid of DBLSTM recurrent neural networks and SVM for multilingual online handwriting recognition. Indeed, the original purpose of our work consists in how to adapt and exploit the effectiveness of BEM and CNN models in online multilingual handwriting recognition. As shown in Fig. 1, the architecture of the proposed system proceeds as follows: First, the input pen-tip trajectory (x, y) is denoised and normalized by a preprocessing

Characters
Beginning module. Second, the handwriting trajectory is divided into continuous components called SOHTs (Segment of Online Handwriting Trajectory) each of which represents trajectory segments limited between two successive limit points: starting points (pen-down) and ending points (pen-up). Third, two types of feature vectors are extracted for each SOHT: the online hand-drawn trajectory features are extracted using BEM, and the generic features generated by the last CNN layer after converting the cursive trace to bitmap image. After that, SOHTs are fuzzy classified into k sub-class defined by k-means unsupervised algorithm in the training phase. The SOHT's fuzzy classification uses as input the extracted BEM and CNN feature vectors to train two DBLSTMs networks integrated in the online and offline branches respectively. Finally, the description of the fuzzy output obtained by the two DBLSTMs are combined using SVM to enhance the global system discriminating power. In the following sections, we introduce each module in detail.

Preprocessing
The handwriting trajectories are collected online via a digitizing device. It is characterized by a high variation which requires applying geometric and denoise processing steps to minimize handwriting variabilities, normalize size dimensions, and reduce noise. Given the raw trajectory, the low-pass filtering Chebyshev type II with a cut-off frequency of fcut =10Hz is used to mitigate the effect of noise and errors due to temporal and spatial quantification introduced by the acquisition system (see Fig.2). The value of cut-off frequency results from a compromise between the conservation of the handwriting undulations produced by young agile writers and the elimination of those due to the physiological tremor of other older writers [34]. The horizontal fulcrums level of character handwriting decomposes their drawing area in three zones namely upper, core, and lower regions respectively. Consequently, a procedure for normalizing the size of the handwriting is applied to adjust its height to a fixed value h = 128, while keeping the same ratio length/height [9], [12]. Both the preprocessing technique are used and tested for the supported scripts such as Arabic, Latin as well as digits. After denoising and eliminating the handwriting variation of the input ink signal using the preprocessing technique, the next step is to divide the online signal into SOHTs components which are used subsequently for the pre-classification phase.

Features extraction and SOHTs pre-classification
In this section, we describe the feature extraction method for the shape trajectory representation and SOHTs pre-classification step which serves to ameliorate the recognition rates. Indeed, we identify two types of features classes in our work: 1) Progressive plot features are extracted for each stroke after segmentation strategy using BEM. In this context, we benefit from this model to represent the dynamic (velocity) and static (geometric) profiles of the online handwritten trajectory.
2) Post drawing or perceptive features are extracted from the bitmap offline image using the CNN model.

Beta-Elliptic modeling (BEM)
The BEM derives from the kinematic Beta model with a juxtaposed analysis of the spatial profile. It considers a simple movement as the response to the neuromuscular system which is described by the sum of impulse signals [35] as the Beta function [6]. The specificity of BEM is the combination of two aspects of the online handwriting stroke modeling: the velocity profile (dynamic features) represented by a beta function that culminates at a time tC coinciding with a local extremum (maximum, minimum, or double inflexion point) as shown in Fig. 3.a), and an elliptic arc as illustrated in Fig. 3.b) modeling the static (geometric) profile for each stroke of the segmented trajectory. We describe in the following sub-section how the BEM process.

Velocity model
In the dynamic profile, the curvilinear velocity Vσ(tc) shows a signal that alternates between extremums (minima, maxima, and inflexion points) which delimit and define the number of trajectory strokes. For BEM, Vσ(tc) can be reconstructed by overlapping Beta signals where each stroke corresponds to the generation of one beta impulse represented by the following expression: .
Where, t0 and t1 are respectively the starting and the ending times of the generated impulse which delimiting the correspondent trajectory stroke, tc is the instant when the beta function reaches its maximum value as depicted in Eq. 2, K called impulse amplitude, p and q are intermediate shape parameters.
As described in Eq. 3, the velocity profile can be reconstructed by the overlapped beta signals. Some examples of the velocity profile modeling of the online Arabic script like character ‫'ح'‬ and word ‫,'علم'‬ and Latin character 'a' was presented in Fig. 4.a), c), and e) respectively.

Trajectory modeling
In the space domain, many studies have been investigated for handwriting generation. In Bezine et al. [12], each stroke located between two points M1 and M2 is assimilated to an elliptic path verifying Eq. 4, where X and Y denote the cartesian coordinates along the elliptic stroke, a and b are respectively the small and large axis dimensions. 22 22 1 XY ab += Also, an elliptic arc is described by the tangent of the trajectory on their endpoints M1 and M2 which is invented by [11]. Indeed, each elementary beta stroke located between two successive extrema speed times can be modeled by an elliptic arc characterized by four geometric parameters such as: a, b, ɵ, ɵp as shown in Fig.3. b). Where a and b represent respectively the half dimensions of the large and the small axes of the elliptic arc, ɵ is the angle of the ellipse major axe inclination, and ɵp denotes the trajectory tangent inclination at the minimum velocity endpoint. These parameters reflect the geometric properties of the end effector (pen or finger) trace, dragged by the set of muscles and joints involved in handwriting. Fig. 4.b), d), and f) depict respectively some examples of modeling geometric profile of the same chosen Arabic and Latin samples.

Hybrid Beta and Elliptical models
As mentioned previously, each simple stroke is described by 10 parameters as summarized in Table 2. The first six beta parameters give the overall temporal properties of the neuromuscular networks implicated in motion generation, whereas the last four elliptical parameters present the global geometric properties of all the muscles and joints inducted to execute the movement. Table 2. Features Extraction generated by using BEM.

Post drawing CNN feature
In this sub-section, we briefly explain the use of CNN model in our context. Inspired by the recent successes of deep learning extracted features in different topics, we have employed CNN architecture in the off-line bitmaps reconstructed from the SOHTs database. The later represents just the final layout of the hand drawing that skirts the chronologic style of its generation. Indeed, CNN features are extracted from the offline SOHTs by a convolution network including multiple convolution and max-pooling layers. Also, we use batch normalization and dropout techniques to improve feature extraction. The details of the CNN module are described in the experimental section. The input image (32×32) is transformed to CNN-feature of size (Bachsize, L, D). Specifically, Bachsize is fixed to 32, L and D are the length and the depth of the CNN entity, respectively. As shown in Fig. 5, the CNN output in trajectory modeling step is noticed by F = (F1, F2, ...,  FN) and Fi ϵRD.

SOHTs pre-classification
After segmentation of the handwriting trajectory into SOHTs and the extraction of their offline visual features, we obtained an extensive database of multi-lingual handwriting segments with an indefinite number of labels due to the large variability of writers' handwriting styles and especially when mixing cursives and discrete styles. Thus, failing to manually assign a label to each SOHT, we chose to use the k-means unsupervised clustering algorithm for the SOHTs preclassification in training track (see Fig. 5, training track). The considered number of sub-groups K is defined empirically so as to maximize the recognition rate. Indeed, the change of the value K leads to the modification of the network's accuracy of every single classifier and consequently on the overall fusion recognition system.

Fig.5.
SOHTs pre-classification and online handwriting script recognition process.

Sequence Recognition
In our work, we introduced hybrid online and offline models based on DBLSTM for SOHTs fuzzy classification and SVM for handwriting sequence (word, letters, digits, etc) recognition by melting decision. We present in this section the different modules of our recognition process.

DBLSTM for SOHTs fuzzy classification
This step consists of training all SOHTs into their K groups using deep RNN with BLSTM version. In fact, the choice of this type of network is demonstrated by the fact that the handwriting trajectories is composed of a sequence of beta strokes over time, and the ability of the RNN to model sequential data at each time. It is often said that recurrent networks have memory, since they maintain a state vector that implicitly contains information about the history of all the past elements of a sequence. It takes as input the current information plus what they previously received in time. The major limit of RNN architectures is that error gradients vanish exponentially quickly with long-term dependencies. To deal with this problem, Hochreiter et al. [28] have introduced another architecture called LSTM network which is frequently used to learn longer-term dependencies and reduce the vanishing gradient problem [13]. ( ) Where W * represents the input-to-hidden weight matrix, U * is the state-to-state recurrent weight matrix, and b * is the bias vector.
( ) it and ft are used to control the updating of ct (Eq.9) which in turn saves the longterm memory. is the operation denoting the element-wise vector product. The output gate ot is used to control the updating of ht as shown in Eq.10. In various tasks, it is required to use both past and future complementary contexts instead of the RNN that only uses past contexts. To build a bidirectional LSTM model, we combine two LSTM sub-layers in both directions forward and backward [42]. The details of BLSTM architecture are described in the experimental section.
As mentioned previously, each trajectory T is composed of a number of SOHTs, T = {SOHT1, SOHT2,… SOHTn}. In the training track, each SOHTs sample is assigned to the most likelihood group Cj, j=1…K, according to its visual offline features stemming from CNN. Thereafter, this unsupervised assignment will serve to train two DBLSTM networks used for SOHTs fuzzy labeling. Indeed, the SOHTs labeling stage considers two types of feature sets for each handwriting segment SOHTi, i=1…n. The first set Xi_on combines dynamic and geometric features extracted using BEM model while the second Xi_off contains the post drawing features extracted by CNN model. As shown in Fig. 5, the output of each DBLSTM is a vector of size K: {P(Xi_on|C1) P(Xi_on|C2)…P(Xi_on|Ck)} and {P(Xi_off|C1) Pi_off(Xoff|C2)…P(Xi_off|Ck)} representing the membership probabilities of the i th analyzed handwritten trajectory segment to the K SOHTs groups for both online and offline branches respectively.

Regularization and data augmentation
Regularization is a relevant process to enhance the performance of deep recurrent neural networks. The most commonly used regularization method is the dropout technique [43]. It is used in the training process to avoid the over-fitting problem by randomly dropping hidden units. In our case, we use dropout technique in both input and fully-connected layers with the probability of 0.3 which is similar to the approach taken in [57]. Also, the wide number of training data is another key to the success of deep neural networks. To increase our training dataset, we adopt the Output Gate data augmentation strategy which is widely used in image-based recognition systems [22]. This approach proceeds by generating randomly distorted samples from those composing the original dataset. For this, we modified some parameters like the inclination angle of the trajectory, baselines, italicity, smoothing, etc. These techniques broaden the training set, generating more training samples, and simultaneously bringing more variation into the training set.

SVM fusion engine for scripts recognition
After SOHTs fuzzy labeling, the outputs of DBLSTM of both online and offline sub-system are treated as input for the SVM classifier to make decision fusion. Indeed, SVM is considered as a powerful tool for linear and nonlinear classification based on a supervised learning algorithm. It has shown high success in many practical applications such as pattern recognition. Contrary to traditionally artificial neural networks, the basic formulation of SVM is the structural risk minimization instead of empirical risk. As shown in Fig.7.a), SVM is mostly used to determine an optimal separating hyper-plane by adopting a novel technique that maps the sample points into a high-dimensional feature space using a nonlinear transformation. It was originally designed to solve binary classification problems. However, it can be employed also to solve multiclass problems (see Fig.7.b) using several methods such as one-versus-all [15] based on dot product functions (kernel functions) in feature space. Among these functions, we can find linear, Radial Basis Function (RBF), sigmoid, etc. In our context, we use RBF function as a kernel in the hybrid model which is defined as: Where γ represents the parameter determined empirically, x and x' denote the input vectors respectively, φ is a nonlinear transform features space. For the recognition of the overall handwritten script, the SOHTs vectors belonging to the same online trajectory (word, letter, digit, etc) are gathered to form the fuzzy SOHTs membership matrix Pon(i,j) and Poff(i,j) provided as input layer for the SVM which allows to merge local decisions and determines the classification of the overall handwritten script.

Evaluation results
In this section, we describe our experimentation which is performed on multilingual online handwriting recognition scripts. The utilized datasets are initially presented followed by the conducted ablation studies and discussed results. Indeed, two metrics were employed to evaluate the proposed system. The Character Error Rate (CER) and Word Error Rate (WER) defined as the percentage of characters or words incorrectly recognized. These metrics are evaluated with the output of each DBLSTM and also with the fusion of them using SVM engine. Next, we compare our approach with those of the state-ofthe-art approaches using the same databases. Finally, we present some strengths and limitations of the present system in the error analysis section. All the test was implemented using core i7 processor of 3.2 GHz, 8 GB of Ram.

Datasets overview
One of the most difficult attitudes of online handwriting recognition is the requirement for a standard database that serves a variety of writing styles and includes the various classes in the target language. In order to test the performance and efficiency of our proposed system, we have used three datasets: Online-KHATT [30] for isolated Arabic characters, ADAB [8] for Arabic words, and UNIPEN [26] for Latin characters and digits. In this section, we describe these publicly available datasets.

Online-KHATT dataset
Online-KHATT is a new open-vocabulary Arabic dataset proposed by [4]. It is composed of 10,040 lines of Arabic online text taken from 40 books, which is written by about 623 participants using android and windows-based devices. Authors claim that this database is challenging which deal with many problems such as thickness, writing styles, dots number, and position. In our experiment, we use a subset of the segmented characters of this dataset [3] that contain 44,795 observations after used data augmentation technique. The subset is partitioned into a training set with 30,795 samples, a validation set of 4,000 and 10,000 for testing. As described in the previous section, the shapes of the letters differ depending on whereabouts in the word they are found which makes in total 114 different shapes in this dataset.

Unipen benchmark dataset
The Unipen dataset contains three sections 1a, 1b, and 1c for digits, upper, and lowercase Latin characters. The size of each data section is 16K, 28K, and 61K respectively. This database is divided into training, validation, and test sets. Table  3 summarizes the total number of characters for each section.

ADAB dataset
The ADAB (Arabic DAtaBase) has been employed extensively in the literature. It consists of more than 33,000 Arabic words handwritten by 170 different writers. The text is written from 937 Tunisian town/village names. As shown in Table 4, this database is divided into six distinct sets held at the ICDAR 2011 for online Arabic handwriting recognition competition [8].

Table 4. ADAB dataset description
In order to increase the training accuracy, we have applied the data augmentation technique. We used set1, 2, and 3 which contain more than 45,158 words for the training process. We employed set 5 and 6 for validation and we tested our system using 4417 words from set 4. Number of pseudo words  Writers  Set 1  5037  40296  56  Set 2  5090  25450  37  Set 3  5031  15093  39  Set 4  4417  22085  25  Set 5  1000  4000  6  Set 6  1000  8000  3  Total  21575 114924 166

Ablation studies
To study the impact of the proposed architecture for recognizing multi-lingual scripts using hybrid DBLSTM and SVM engine on both online and offline branches, we have designed three groups of experiments. The first experiment is related to the pre-classification step which consists of choosing the value K that corresponds to the best grouping of SOHTs. After trajectory segmentation, we construct a dataset composed of more than 200.000 SOHTs. These later are classified into K groups using the extracted BEM parameters by applying the k-means clustering algorithm. The value K is fixed empirically after several tests to 210 which represents the number of groups that return the least within-cluster sums of points to centroid distances. 70% of this dataset is used for training the next stage composed of DBLSTM and the rest for the tests.

Structure of CNN feature extraction
The second experiment consists of selecting the best CNN feature. We tried several parameters of CNN to train the offline system on the SOHTs training dataset. As illustrated in Table 5, three variations of CNN model were designed by changing the number of layers and filters in each layer. We chose CNN3 with 13 convolution layers and 4 Max-Pooling layers similar to Zhang et al. [58] because it is the most efficient architecture which generates a high accuracy on SOHTs dataset. Indeed, the input layer is of size 32x32 of gray-level image. The applied filter of convolutional layers is 3x3 with fixed convolution stride to one. The dimension of feature maps in each convolution layer is increased gradually from 50 in layer-1 to 400 in layer-12. After three convolutional layers, a maxpooling 2×2 window with stride 2 is implemented to halve the size of feature map. We train the network using its parameters: a stochastic gradient descent 'sgdm' with momentum and mini-batch of size 100, the learning rate is 0.001. We use also the dropout technique with 0.2 only in the last layer. Table 5. CNN configuration: Conv means Conv, Norm-ReLU layer. The parameters of the convolution and max-pooling layers are represented as "Conv: filter size (number of filters)" and "Max pool: filter size", respectively.

DBLSTM parameters
In the proposed system, we used the same architecture of deep BLSTM for SOHTs fuzzy classification on both online and offline data respectively. The network parameters in training data are fixed after several tests. The best topology of our network architecture is composed of three bidirectional hidden layers which give good results. The size of the input layer depends on the dimension of feature vector using BEM or CNN respectively. We have also changed the number of nodes ( 64, 128, 256, and 400) in the forward and backward hidden layers. The best number of nodes applied in our DBLSTM is 400 nodes as described in the experimental results section. The output layer size is defined by the number of SOHTs class. Dropout is used in fully layer with a probability of 0.3. We train the network using its parameters: a stochastic gradient descent 'sgdm' with momentum and mini-batch of size 200. We also start the training with an initial learning rate of 0.001 and 400 maximum number of epochs. A categorical crossentropy loss function is utilized to optimize the network. After each epoch, we shuffle the training data to make different mini-batches.

Experimental Results
One of the most important problems of handwriting recognition is the selection of the relevant set of features that have been the subject of several studies [33]. Since the focus of our work is to show that the used method which is founded on the combination of two approaches of handwriting representation already described, can be useful in online handwriting recognition. To understand the effectiveness of our method, we have drafted three groups of experiments: one is based on dynamic and geometric features obtained by using BEM, the second on bitmap feature from CNN model while the third on the fusion of the two.

Experiments on Online-KHATT
To evaluate the performance of our proposed system for Arabic characters, we used Online-KHATT dataset described previously. As shown in Fig. 8 and Table  6, we conduct further study of the number of layers and nodes per layer for both BEM and CNN features to determine the optimal size of the BLSTM network. We observe that for the two input features, the use of 3 layers outperforms more shallow networks, and the employ of more layers brings almost no improvement. Further, using 256 nodes per layer is sufficient, as larger networks give only limited improvements, if at all. We can see a significant improvement in the CER  using CNN than BEM model which is explained by the useful offline features generated by CNN model allowing a fair compromise between character class discrimination and different writing style gathering. For the last stage of the proposed system, we combined the outputs of the DBLSTMs in order to improve results. To do this, we carried out some tests using SVM engine with different kernel functions. As shown in Table 7, the CER on Online-KHATT dataset has been decreased with RBF function to 5.25% with the hybrid of the two models. Moreover, we compare our system to others described in the literature that also performed on online Arabic character recognition. The results are summarized in Table 8 which present for each work: the used classifier, feature extractor model, and its accuracy. We can see that the performance achieved by our system is better than the two already mentioned systems which are trained on the same training dataset. We observe that the obtained results on online-KHATT dataset are significantly better than those found by [4] and [48]. This is due to the use of the complementarity models for handwriting representation on the one hand which described by the simultaneously geometric and dynamic features extracted by using BEM and the strength of a deep learning CNN model which represents the discriminating power to differentiate Arabic characters. On the other hand, the combination models allow the enhancement of the global system discriminating power.

Experiments on UNIPEN
We evaluate the performance of our system on the test set of online signals from UNIPEN dataset that distributed on three sections 1a, 1b, 1c. Table 9 shows comparison results with other previous studies using the same database. We observe that the results achieved by our system using the combination of the two models are better than the 'writer-independent' experiment reported by [7] that uses cluster generative statistical dynamic time warping (CSDTW). This indicated that our proposed method is also robust and very promising for online handwriting Latin characters and digits. Again, we refer to the more recent work [48] which uses LSTM with CTC approach. We note that the recognition rate is very competitive. It would be noted that UNIPEN is made up of very difficult data because of the great variety of writers, mislabeled or noisy data.

Experiments on ADAB
We also carried out experiments on the test set of the ADAB database using the same network architecture. As summarized in Table 10 and Fig 9, the obtained WER results show the effectiveness of our architecture using single models. In the same way, the lower WER is achieved when we use 3 layers of BLSTM with 256 nodes. Further, the performance of our system has been gained when we combined both DBLSTMs models. As shown in Table 7, the low WER is 1.28% obtained employing SVM with RBF function.
To better understand the efficiency and the robustness of our system, we discuss the performance compared to the previous state-of-the-art systems. It may be noted from  Indeed, despite the association of on-line and off-line data are previously used in related works for handwriting recognition, our developed system is distinguished from the literature [21], [1] by the integration of multi-stage deep learning networks for the analysis and the fuzzy classification of both online and offline data. Also, we adopt the decision model merger between the fuzzy classification results obtained by the two branches classifiers of respectively online and offline data using SVM engine. This new architecture has demonstrated its efficiency in the test phase by achieving advantageous WER 1.28% with respect to the state of the art and in particular, comparing to the system of [21] which uses deep learning network only on the branch of offline data processing.

Error Analysis
We demonstrate in this sub-section the strengths and limitations of the present system by depicting respectively some examples of correctness and misclassified sequences. Fig. 10 shows some examples of handwritten characters, words, and digits from the used datasets, processed by our recognition system. Each item is annotated with the label of the correspondent handwritten proposition. We can identify some samples that are not recognized by one of the using models and corrected by the other and vice versa. For instance, the first and third top samples of corrected partition are well classified by the online BEM and not handle well by CNN. Likewise, the second, fourth, and right fifth samples are recognized only by CNN model. Based on these observations, the accuracy of our global system can be increased by using the combination of the two modules which explain the complementarity them. However, despite the good results achieved by our system, we notice that the result obtained using Online_KHATT and UNIPEN datasets remain to be improved due to the great diversity of writing styles and the similarity between many characters. We note that some examples are quite distorted such as the first sample of ‫,'ح'‬ the third sample of ‫'ر'‬ for the Arabic script as well as the '3' and '9' for digit samples in error samples partition.

System
Feature WER (%) SVM [21] BEM + CDBN 2.49 HMM [1] Directional features + Offline features 2.22 MLP/HMM [45] BEM features 3.55 SVM [60] BEM + perceptual code 1. Also, we notice that the first and the fourth sample of ‫'ر'‬ on the one hand, the first and the third sample of ‫'د'‬ on the other hand are legible but misclassified, most likely because they have an appearance rare among this letter's training examples. Likewise, a confusion between different forms of the position of the same character such as isolated ‫'و'‬ and the end ‫,'ـو'‬ and the same forms of position like middle ‫,"ـبـ"‬ and middle ‫"ـيـ"‬ which can be explained by the wrong identification of the delayed stroke. Other examples are not recognized by our system of Latin script such as the three samples of the uppercase characters 'F' and 'C', the lowercase of characters 'b' which can be explained by the diversity of writing style and the similarity between many characters of these scripts.

Conclusion and Future Work
We presented in this study a multi-lingual online handwriting recognition system based on multi-stage DBLSTM recurrent neural network. The proposed system divides the online handwriting trajectory into SOHTs and the pre-classified into sub-groups. Then, two online handwriting representation models are employed: the BEM which is characterized by the presence of dynamic and geometric features of online pen-tip trajectory and a powerful feature extracted using CNN model. Indeed, we use the extracted features to make the classification and compare the performance of single classifiers using DBLSTM. We proceed also by combining the obtained results using SVM to enhance the global system discriminating power. The efficiency of the corresponding models, and of a combination of these, was evaluated in experiments using three databases; online-KHATT for on-line Arabic characters, ADAB for Arabic words, and Unipen for Latin characters and digits. By combining models, the obtained results suggest that our proposed method is well suited to online handwriting recognition compared to other works in the state-of-the-art which can be demonstrated by the complementary and ascendancy of both models.
We have also observed that the used features extraction models are rather generic and their applicability in the other scripts as Persian, Chinese, English, etc., is interesting and we envisage as future work. Again, we schedule to accomplish our method for text recognition.