An Empirical Investigation of the Effectiveness of Optical Recognition of Hand-Drawn Business Process Elements by Applying Machine Learning

Business process diagrams represent an essential part of business process management activities, including process analysis, process-related communication, and process automation. While the majority of these artifacts are produced with modeling tools, there are still cases where hand-drawn diagrams are created, especially in the initial phases of process discovery and process innovation activities. The transformation of a hand-drawn diagram to digital format often presents several challenges. Thus, our efforts are directed towards investigating the effectiveness of an automatic transformation of hand-drawn diagrams into digital artifacts utilizing optical character recognition. To this aim, we performed empirical research in which subjects were instructed to redraw standardized process elements, which were afterward used for the training and testing set of machine-learning-based character recognition application. The ﬁndings obtained in the analysis showed that TensorFlow based and trained solution is capable of identifying hand-drawn process diagrams elements on different levels of accuracy. These insights may be considered when specifying or adapting the visual vocabularies of notations assuring appropriate visual distances between depictions of individual elements.


I. INTRODUCTION
Business processes are organizational assets that are central to creating value for customers. This implies the need for managing the 'key processes' to produce consistent value and to improve organizational performance indicators [1]. The success of business process management (hereinafter BPM) depends on transparent and continuously improving business processes, which mostly result from business processmodeling techniques, approaches, and tools [2], [3].
Business process modeling is concerned with the representation of organizational processes, so that current processes may be analyzed and improved in the future. Besides, business process modeling is not only a requirement for many quality management systems in organizations but also plays an important role in the implementation of workflow management and enterprise resource planning systems [2].
The associate editor coordinating the review of this manuscript and approving it for publication was Vicente Alarcon-Aquino .
Business process modeling is mainly performed with IT solutions (i.e. modeling tools), whose aim is to support the management of the assets involved within the BPM lifecycle [4]. Business process modeling activities result in a business process diagram, a visual artifact, typically expressed in a graph-based process modeling notation. The primary purpose of a process diagram is to provide a means for standardized and more effective communication between process-related stakeholders, especially analysts [5]. Secondary, business process diagrams serve as a starting point for process analysis, simulation, improvement, and automation.
Despite a variety of available process modeling tools, many experts still prefer to create process diagrams in a hand-drawn manner, especially in the initial phases of process discovery activities. Modeling diagrams 'by the pen' may be associated with similar benefits to handwriting, e.g. unconventional possibilities of expressing in diagrams, better reading comprehension, more effective memory recall, sharpened critical thinking, and increased creativity [6] (FIGURE 1). However, at a specific time, hand-drawn diagrams usually need to be transformed into the corresponding digital artifacts to facilitate their application beyond a means of reasoning and communication, i.e. process interchange, process simulation, and process automation. Unfortunately, this usually requires remodeling of hand-drawn diagrams with the use of modeling tools, which is a non-value adding as well a time-intensive activity. A more efficient alternative to remodeling of hand-drawn diagrams may be to apply optical character recognition (hereinafter OCR) of hand-drawn diagrams in a similar way to how paper-based documents are transformed into digital ones. However, since no dedicated or trained IT solutions were identified for the optical recognition of process diagrams, our objective was to address this gap by investigating the effectiveness (i.e. accuracy) of existing OCR solutions, which may be applied or trained in recognizing hand-drawn process elements. Similar challenges remain in the related domains of the optical character recognitions of visual vocabularies in which researchers primary focus on the recognition of online symbols or diagrams and less on scanned images of the handwritten diagrams [7].
To achieve the above-stated objective, we initially assured a large number of hand-drawn process elements by asking subjects to rewrite them according to the standard. In this phase, we already limited our research to 'Business Process Model and Notation' (BPMN 2.0), a de-facto, and ISO standard in process modeling [8]. Besides, BPMN also supports other BPM-related activities, including process discovery, process analysis, process redesign, and process implementation. In the second phase, we explored existing solutions for optical recognition and machine learning to find the most applicable ones for the domain of the investigation. In the third phase, a mobile application based on the Angular.js framework was designed, developed, and trained to recognize hand-drawn BPMN elements. For detecting BPMN elements, we used TensorFlow, a free and open-source software library for dataflow and differentiable programming across a range of tasks. It is a symbolic math library and is also used for machine learning applications such as neural networks. In the last phase, the effectiveness (i.e. accuracy) of the developed solution was tested on a sub-sample set of hand-drawn process elements.
Accordingly, the paper is organized as follows. The introduction chapter already identified the problem and motivation for the proposed solution, whereas the second chapter contains background on business process modeling, optical character recognition, and machine learning. The third chapter presents related work in the field, focusing on image recognition cases. The fourth chapter presents the research, which is according to the above-stated phases divided into the modeling part, implementation part, and recognition part. The fifth chapter presents the results of applying OCR on hand-drawn BPMN elements by using the implemented solution and discusses them concerning the stated objective and research questions. Conclusions, limitations, and future work are presented in the last chapter.

II. BACKGROUND
In this chapter, we discuss the fundamentals of business process modeling, optical character recognition, and machine learning to help understand the rest of this paper.

A. BUSINESS PROCESS MODELING
Business process modeling is the activity of graphically documenting and depicting business processes [9], which are commonly positioned at the center of the management and operations of modern organizations. Large organizations may manage thousands of process models in their repositories since they form a knowledge base that enables an advantage in a corporate environment [9], [10]. Process diagrams enable employees and other related stakeholders to understand the modeled process and share such an understanding [12] as well as to find potential points of optimization [9]. Besides, process diagrams are considered as a foundation to guide decision-making in the business processes [13], since they convey information more efficiently and effectively when compared to natural languages. Due to the 'picture superiority effect', the information represented visually is also more likely to be remembered [5].
From a theoretical perspective, a process diagram represents a 'visual sentence' based on a process modeling notation, which generally provides a mean to graphically represent process-related concepts (e.g. activities, control flow, events, resources, roles, actors, functions, organization, information, and hierarchy) to compose a business process and its interaction with the environment [9]. To achieve this, (formal) process modeling notations primarily consist of graphical symbols (i.e. vocabulary), definitions of the meaning of each symbol (i.e. semantics), and a set of compositional rules (i.e. grammar) [5].
Many notations that enable business process modeling are present. However, to reduce the risk of misunderstanding the conveyed information, it is preferred to model diagrams using a standardized notation, which is understood by all stakeholders [12]. Among those, BPMN 2.0 is considered to be the de-facto standard in business process modeling [13] [14] and is nowadays the most widely used process modeling notation [9]. Process diagrams (FIGURE 2) are commonly created with the help of modeling tools and diagramming tools. A modeling tool (e.g. Signavio, Bizzagi, and BPMN.io) offers advanced functionalities, such as compliance with the standard, availability of a repository, and validation support [14]. Besides, modelers also have the choice of creating process diagrams by using generic diagramming tools, which usually support several modeling techniques or notations. Such tools (e.g. Microsoft Visio or Dia) are commonly extensible with templates (i.e. stencils) used to draw or paint symbols, shapes or patterns. In contrast to dedicated tools, they commonly do not implement a standardized meta-model, so they have limited capabilities in light of syntactical and semantic verification of diagrams, as well as serialization of diagrams. To summarize, dedicated modeling tools can reduce the time and effort needed for developing solutions concerning particular problem types, whilst generic diagramming tools can be applied to different notations within a wider variety of problems, yet have difficulty finding results that are relevant to a specific problem [16].
Dumas et al. [1] provide a slightly different classification of modeling tools, namely (1) pen and paper; (2) haptic modeling tools, which rely on physical artifacts; (3) singleuser tools, which are further divided into general-purposive tools such as Microsoft Visio and specialized tools, such as Bizagi modeler; and (4) multi-tenant tools (i.e. cloud modeling tools).

B. OPTICAL CHARACTER RECOGNITION
OCR stands for optical character recognition, which is a process of the classification of optical patterns contained in a digital image, whereas the character recognition is achieved through segmentation, feature extraction, and classification [17]. Character recognition is one of the most interesting areas of pattern recognition and artificial intelligence, with increasing attention due to its wide range of applications including invoice imaging, legal industry, banking, healthcare, captcha, institutional repositories and digital libraries, optical music recognition, automatic number recognition, and handwriting recognition [18].
Based on the type of input, the OCR systems can be categorized as handwriting recognition and machine-printed character recognition. The latter is less challenging since characters are usually of uniform dimensions and standardized representations, and the positions of characters on the page can be predicted [19]. Besides, OCR may be classified into offline recognition and on-line recognition. In offline recognition, the source is either a physical image or a scanned form of the document, whereas in on-line recognition, the successive points are represented as a function of time and the order of strokes is also available [20].
The process of OCR is composed of a (sub)set of the following activities [19], [21]: -Image acquisition. It is responsible for capturing an image from an external (physical) source and converting it into an appropriate form that can be processed by a computer. Approaches, which are applied in this phase, are digitalization, binarization, and compression of an image. -Pre-processing. The objective of this activity is to enhance the quality of image acquisition, e.g. an important part of pre-processing is to find out the skewness in the acquired artifact. Different techniques for skew estimation include projection profiles, Hough transform, and nearest neighborhood methods. In some cases, thinning of the image is also performed before later phases are applied. Finally, the text lines present in the document can also be found out as part of the pre-processing phase. This can be done based on projections or clustering of the pixels. -Segmentation. It is responsible for segmenting the image into individual characters, which can be done in two ways -explicitly or implicitly. In the case of implicit based segmentation (i.e. internal segmentation, recognition-based segmentation), the system searches the image for components that match classes in the vocabulary (e.g. the alphabet). On the opposite, explicit based segmentation identifies individual segments based on 'character-like' properties. -Feature Extraction. In this stage, various features of characters are extracted, which uniquely identify characters. The selection of the correct features and the total number of features to be used is an important research field. Different types of features such as the image itself, geometrical features (e.g. loops, and strokes) and statistical feature (moments) can be used. Finally, various techniques such as 'principal component analysis' can be used to reduce the dimensionality of the image. -Classification. It is defined as the process of classifying a character into its appropriate category (i.e. recognition). The structural approach to classification is based on relationships present in image components. The statistical approaches are based on the use of a discriminate function to classify the image. Some of the statistical classification approaches are Bayesian classifier, decision tree classifier, neural network classifier, nearest neighborhood classifiers, etc. [22]. Finally, there are classifiers based on the syntactic approach that assumes a grammatical approach to compose an image from its sub-constituents. 206120 VOLUME 8, 2020 -Post-processing. The objective of this activity is to improve the accuracy of the results, which can be done with various approaches. One of the approaches is to use more than one classifier for the classification of the image. The classifier can be used in a cascading, parallel, or hierarchical manner. Afterward, the results of the classifiers can be combined using various approaches. Besides, to improve OCR results, contextual analysis can be performed. The geometrical and document context of the image can help in reducing the chances of errors. Lexical processing based on Markov models and dictionary can also help in improving the results of OCR. The accuracy achieved in OCR is still not 100% accurate, with around 97% for typewritten text and 80% to 90% for handwritten text [23].

C. MACHINE LEARNING AND TENSOR FLOW
Artificial intelligence, machine learning, and deep learning are three commonly used terms that describe software that behaves intelligently. Artificial intelligence is an umbrella term, specifying any technology that mimics human behavior, whereas machine learning is a method of data analysis that automates analytical model building. It is a branch of artificial intelligence based on the idea that a system can learn from data, identify patterns, and make decisions without or with minimal human intervention. The machine learning process generally consists of gathering data, preparing the data, choosing a model, training, evaluation, hyperparameter tuning, and prediction (FIGURE 3). Deep learning is a subset of machine learning, composed of algorithms that permit software to train itself to perform tasks. Deep artificial neural networks are a set of algorithms reaching new levels of accuracy for many important problems, such as image recognition, sound recognition, recommender systems, etc. The TensorFlow technology, which is used in this work, focuses on the machine and deep learning.
TensorFlow is a machine-learning system that operates in large scale heterogeneous environments. It is based on dataflow graphs to represent computation, shared state, and the operations that mutate that state. TensorFlow maps the nodes of a dataflow graph across many machines in a cluster, and within a machine across multiple computational devices, including multicore processors, general-purpose graphics processing units, and custom-designed integrated circuits (ASIC), known as Tensor Processing Units (TPUs). This architecture enables developers to experiment with novel optimizations and training algorithms.
TensorFlow uses a single dataflow graph to represent all computation and state in a machine learning algorithm, including the individual mathematical operations, the parameters and their update rules, and the input pre-processing. The dataflow graph expresses the communication between sub-computations explicitly, thus making it easy to execute independent computations in parallel and to partition computations across multiple devices. TensorFlow differs from batch dataflow systems in two respects: -The model supports multiple concurrent executions on overlapping subgraphs of the overall graph. -Individual vertices may have a mutable state that can be shared between different executions of the graph. TensorFlow supports a variety of applications, with a focus on training and inference on deep neural networks [24]. Some of the companies currently using Tensorflow are Google, Airbnb, eBay, Intel, DropBox, DeepMind, AirBus, CEVA, Snapchat, SAP, Uber, Twitter, and IBM.

III. RELATED WORK
In general, recognition is an extended method used in many different domains. Its dimensions also offer opportunities in current and practical fields. For example, it can be used for recognition fingerprint [41], recognition faces [42], and gesture recognition [39]. Even more, the gesture recognition in the computer accessibility domain can contribute to efficiency interaction with the computer for people with disabilities [40]. Although the recognition methods can be found in different domains, a literature review was focused on identifying and analyzing the research that includes the optical character recognition of visual vocabularies.
The analysis of related research indicates that researchers mostly focused on the recognition of on-line diagrams and less on offline diagrams interpretations, using scanned images of the pen-and-paper diagrams [7]. Within that, the researchers investigated the recognition of different hand-drawn graphical artifacts, including electronic circuits and components, UML diagrams, architectural drawings, and flow charts. [25]. For example, in 2002, Wenyin et al. [26] investigated on-line sketchy graphics recognition. The authors divided the procedure into four phases: pre-processing, shape classification, shape fitting, and regularization. The research results show that shape recognition and regularization algorithms provide a precise recognition of 91,2% [26].
In the domain of UML diagrams, Deufemia et al. [27] investigated the recognition of sketched UML class diagrams based on five different notation's elements (class, package, association, aggregation, composition, inheritance). On average, the baseline system correctly identified 79% of the symbols, while the multilayer algorithm correctly identified 90% of them. VOLUME 8, 2020 In the electronic circuits and components area, Edwards and Chandran [28] presented an application of image processing techniques for the recognition of hand-drawn circuit diagrams. In the pre-processed phase, the scanned images of circuit diagrams were converted to bilevel after the noise of scanned images of circuit diagrams was removed. The set of morphological operations was used to get a clean and thin version of images. The authors distinguished between connections, nodes, and components, which were segmented using appropriate thresholds on the pixel density. The results indicate that the node recognition of 92% was achieved on a database comprising 107 nodes, and component recognition accuracy of 86% was achieved on a database containing 449 components. In 2016, Rabbani et al. [29] investigated the recognition of electrical symbols from an electrical diagram using artificial neural networks. The process was divided into two phases: the first phase was feature extraction using shape-based features. In contrast, the second phase was the classification procedure using an artificial neural network through a backpropagation algorithm. The artificial neural network was trained and tested with 20 different hand-drawn electrical images in each class. The authors concluded that the proposed method enables recognition and identification with high accuracy (Precision=85%, Recall=0.83%, and F-measure=0.83%) [29]. In 2016, Patare and Joshi [30] explored the method for sketched digital logic circuit components recognition. Support Vector Machine was applied for the classification of each component of hand-drawn diagrams. The results indicate an achievement of an average of 83% of the circuit recognition accuracy. In 2018, Moetesum et al. [25] presented a technique for the segmentation and recognition of electronic components given with hand-drawn circuit diagrams. The segmentation used a set of morphological operations on the binarized images of circuits. The characterization of each segmented component was performed by computing the Histogram of Oriented Gradients (HOG) descriptor, while the classification was performed using Support Vector Machine (SVM). The results show a segmentation accuracy of 87.7% and a classification rate of 92%. In 2019, Lakshman et al. [31] investigated a method for the recognition of hand-drawn electronic circuit diagrams, based on (1) feature vector by combining Local Binary Pattern (LBP) and statistical features based on pixel density for component recognition; (2) support vector machine (SVM) for classification of components and; (3) the position and sequence of arrangement of components to determine the type of circuit. The results showed that the component recognition method achieved recognition rate of over 99% whereas the circuit type recognition method had a recognition rate above 85%.
Many researchers investigated automatic recognition of handwritten mathematical symbols/characters [32], mathematical expressions [33], and signs within different cultures, including Arabic and Chinese, using artificial neural networks. For example, as early as in 1996, Amin et al. [34] investigated the automatic recognition of hand-printed Arabic characters with the usage of artificial neural networks.
Their approach was divided into three main steps. In the pre-processing step, the original image was captured and converted into a binary image using a 600 dpi scanner. Second, the image skeleton was traced from right to left to build a graph, where some primitives were removed from the graph. To classify the characters, a five-layer artificial neural network was applied. The algorithm, implemented on a microcomputer in C language, was tested with ten users. The results indicated that the quality of writing ranged from acceptable to poor, while the correct recognition rate was 92%. [34] In 2000, Amin [35] presented a technique for the recognition of Arabic text using the C4.5 machine learning system, which was composed of three main phases. The first phase was digitization and pre-processing, where skewness of a connected document image was detected and corrected. In the next phase, feature extraction was performed, where global features of the input Arabic words were used to extract features to avoid the difficulty of the segmentation stage. In the last phase, machine learning C4.5 was used to generate a decision tree for the classification of each word. The system was tested with 1000 Arabic words with different fonts. The result of the correct average recognition rate was 92% [35].
In 2012, Wang et al. [36] presented an effective technique for the offline recognition of handwritten Chinese text. They evaluated recognition performance on a Chinese handwriting database CASIA-HWDB. The results, based on a test of 1015 handwritten pages, indicated that confidence transformation and combining multiple contexts significantly improved the performance of the text line. The technique achieved a character-level accuracy rate of 90.75% and a correction rate of 91.39%. [36] In 2017, Wu et al. [37] investigated handwritten Chinese text recognition using two types of character-level neural network language models: (1) feed-forward neural network language models and (2) recurrent neural network language models. Besides, both of them were combined with back-off N-gram language models to build up hybrid language models. The investigation results, based on the Chinese handwriting database CASIA-HWDB, indicate that neural network language models improved the recognition performance. Besides, hybrid recurrent neural network language models outperformed other language models. The performance on both the CASIA-HWDB and the ICDAR-2013 competition dataset was significantly improved. On the CASIA-HWDB test set, the character-level accuracy rate was 95.88%, whereas the correctness rate achieved 95.95%. On the ICDAR-2013 test set, the character-level accuracy rate and correctness rate achieved 96.20% % and 96.32% [37].
In 2007, Sadri et al. [38] investigated segmentation and recognition of unconstrained handwritten numeral strings that presented the most challenging problems in the area of OCR. The general framework of their proposed system was divided into six steps: input image, pre-processing, segmentation, genetic algorithm, feature extraction, evaluation, and classification. The results indicated that the correct use of contextual knowledge improved the overall performance of the system. On average, the system obtained correct recognition rates of 95.28% and 96.42% on handwritten numeral strings using neural networks and support vector classifiers [38].

IV. RESEARCH
In line with the main objective of our research, namely to investigate the effectiveness of the optical recognition of hand-drawn process elements, and by focusing on BPMN 2.0 -a de-facto and ISO standard in the domain, we specified the following research questions: - To achieve the stated objective and to answer the above research questions, the research was performed in four phases, namely: state-of-the-art, modeling, implementation and training, and recognition phase.

A. STATE-OF-THE-ART
While we were unable to identify dedicated solutions for recognizing process elements, we initially tested existing OCR solutions for the effectiveness of the recognition of BPMN elements without providing any treatments (i.e. machine learning -related training) to the tested solutions. We focused on applications available on the web as well as the application market 'Google Play'. We searched for appropriate solutions by performing the following search strings: ''Image recognition app'' ''recognition app'', and ''text recognition''. Base on the analysis of the search results, we tested the following applications: 'Image Recognition', 'Pen to Print', 'Search by Image', 'Smart Lens', 'Text Scanner', 'Google Reverse Image Search', and 'Google API Cloud Vision '.
Five sample objects were selected for testing of the accuracy of the OCR process of the selected applications: a picture of a dog, a hand-drawn BPMN process element, a standardized BPMN process element, a handwritten letter A, and a typed letter A. The test on the individual object was repeated ten times, whereas the accuracy results are presented in Table 1.
The test results demonstrate that none of the selected solutions offers an ability to recognize hand-drawn or computer-printed BPMN elements (0% accuracy), where common answers offered for each BPMN element were reported as 'symbol', 'logo', 'circle', and 'clip art'. Correspondingly we tested a handwritten 'lorem ipsum' text with Pen to print, again with a resulting 100% accuracy, as evident in the following figure (FIGURE 4).

B. MODELING PHASE
The evaluation and benchmarking of different OCR algorithms and OCR tools require a standardized database of characters, images of symbols used in the investigation [22]. Besides, the availability of a repository containing enough amount of data for training and testing purposes is a fundamental quality requirement [39]. Research in the domain of OCR is mainly focused around six natural languages, namely, English, Arabic, Indian, Chinese, Urdu, and Persian, with publicly available datasets for these languages available VOLUME 8, 2020 such as CEDAR, CENPARMI, HCL2000, MNIST, PE92, and UCOM [40].
However, as we anticipated, we were unable to find a suitable database of standardized process elements in a suitable hand-drawn form. Due to this, we performed empirical research in which subjects were asked to rewrite standardized process elements (BPMN 2.0 compliant) in a hand-drawn manner. A part of the questionnaire used for acquiring hand-drawn symbols is presented in the following figure (FIGURE 5, a version translated into the English language). As evident from FIGURE 5, subjects were initially asked to specify the number of produced BPMN diagrams to search for potential correlations between individual BPMN expertise and OCR accuracy. Afterward, they were instructed to rewrite an alphabet at a normal pace. These data enabled us a referencing point to related work in the domain of OCR of handwritten characters as well to identify potential correlations between the accuracy of OCR of text and BPMN elements.
In the second part, subjects were asked to rewrite 88 standardized depictions of process elements representing the visual vocabulary of BPMN 2.0. The notation also allows combining semantics or visuals of individual BPMN elements, e.g. combined compensation and loop characteristic marker [41, p. 383]. However, these combinations of visual symbols were out of the scope of the questionnaire (FIGURE 6).
In case subjects were familiar with a specific symbol, they were instructed to put a cross symbol '×' in the corner of the shape. Besides, the total time of rewriting the complete set of elements was recorder to identify potential correlations between the speed and accuracy of OCR.   In total 72 subjects participated in the research, whereas all of them were able to complete the questionnaire successfully. Afterward, an anonymized version of the paper-based questionnaires (FIGURE 7) was collected and digitalized resulting in a total of 6336 hand-drawn BPMN symbols (the dataset is available on http://dx.doi.org/10.17632/d6pyngpwr5.2). The segmentation process (i.e. segmenting the images of digitalized questionnaires into individual symbols), was performed prior to the actual OCR process, by a macro that located prespecified regions of individual symbols and transformed them into individual images.
As evident from Table 2, the BPMN expertise, measured with the reported number of models produced, of the subjects is rather low (mean=0.61; stdev=2.36), which means that subjects were inexperienced in BPMN. Only 7 subjects reported a BPMN expertise, i.e. at least one BPMN diagram modeled.
The average time of rewriting the stated BPMN elements was 17 minutes and 9 seconds (stdev=270 seconds).

C. IMPLEMENTATION PHASE
Since we were unable to identify a suitable solution for the optical recognition of BPMN elements (as described in the state-of-the-art chapter), we developed a dedicated solution by performing the software engineering phases as follows.
In the requirements specification phase, we focused on identifying the functional requirements of the solution, which would enable us to achieve the stated goals of the research. The following use-cases were specified: capturing of an image, processing an image, reviewing of recognition results, reviewing the list of BPMN categories, reviewing of similar elements, and reviewing the BPMN elements.
In the software design phase, we have selected the stack of technologies, which allowed us to implement the stated usecases. According to similar research [42], we used the Ten-sorFlow tool to create the machine learning model. Besides Tensor flow, the following technologies and frameworks were used in the final application: AngularJS, Ionic, Cordova, and TensorFlow.js. The user interface of the developed mobile application is presented in FIGURE 8. The figure presents the main functionalities, including capturing the new image and calculating the similarity of the captured image.

D. RECOGNITION PHASE
The DNN model for the recognition of BPMN elements was part of the implementation phase. However, since it represents the focal part of our research, it is presented in a separate sub-chapter. In general, TensorFlow allows the design of different Deep Neural Networks, including CNN (Convolution Neural Networks) and LSTM (Long-Short-Term-Memory) structures, which have been shown to deliver better results compared to shallow network structures. The recognition model was created in the following steps.

1) TRAINING DATA
The images of individual BPMN elements were acquired from the questionnaire, which resulted from the modeling phase, as presented in FIGURE 7. Each folder contained images of a specific BPMN element as hand-drawn in the modeling phase.

2) RETRAINING OF THE NETWORK
To ensure quick re-training of the network, we applied the 'transfer learning' technique [43], which was based on the application of a pre-trained model into the BPMN context. The model was developed on image classifiers capacity trained on ImageNet [44], an image database organized according to the WordNet hierarchy in which each hierarchy node is displayed with more than thousands of images. On average, the database includes more than five hundred images per node and the images are classified into more than a thousand different classes.
For the training of an own (BPMN) classifier, we used TensorFlow and Python. The development environment has been running on a Linux operating system (Linux Mint 19.1 Cinnamon version 4.0.10). The development was based on two examples: one from the official TensorFlow and the other from an official example on Google Codelab named 'TensorFlow For Poets' (TFP tutorial) [45]. The model has been learned and re-trained with the MobileNet network, a small and efficient neural network. MobileNet is a type of convolutional neural network designed for mobile and embedded vision applications. The MobileNet model is based on depth-wise separable convolutions which is a form of factorized convolutions that factorize a standard convolution into a depth-wise convolution and a 1×1 convolution called a pointwise convolution. MobileNet spends 95% of its computation time in 1 × 1 convolutions which also has 75% of the parameters. Nearly all of the additional parameters are in the fully connected layer [46]. MobileNet was performed with a Python script (github.com/tensorflow/.../retrain.py) with the default parameters set.
The input images have been limited to 224pt × 224pt, which presented the most common choice of the pre-trained model. To re-train a model for being capable of recognizing BPMN elements, we used the existing Python script to analyze, calculate, and save the image feature vector (i.e. bottleneck) value for every image. This penultimate layer was trained in output values good enough for the classifier to be able to distinguish between all the classes it was aimed to recognize. The step which follows the bottleneck is the learning of the model.
The data used for the learning and testing process was split at 80:20 ratio without applying cross-validation. 80% of data was used for learning and 20% was used within the testing phase. By default, the script executed the process in 4000 training steps. Each step selected 10 randomly selected images and learning sets, obtained its bottleneck value from the cache, and inserted them into the last layer to get predictions, which were compared to the actual layers. FIGURE 9 presents the percentages of training accuracy, validation accuracy, and entropy during the model training process. VOLUME 8, 2020  TensorFlow simplifies the comprehension or the optimization of the training process. The graph in FIGURE 10 presents the percentage of images used in the current training process, which have been associated with the correct class. The following figure (FIGURE 11) depicts cross-entropy results (i.e. a loss function) that show the progress of learning (lower values are preferred).
In the case of learning of BPMN elements, we obtained a final test accuracy of 94.2%, whereas a usual accuracy value is between 90% and 95%. The output in the learning process is a map of bottlenecks for individual images. The developed mobile application applied the learning model via two files: 'output_labels.txt', which includes marks of the trained elements, and 'output_graph.pb', which includes a retrained model capable of recognizing BPMN elements.

V. RESULTS
The following subchapters provide and discuss the results following the stated research questions.

A. ACCURACY OF OCR
Concerning RQ1, Table 3 summarizes the OCR results of the investigated subset of BPMN elements concerning accuracy in descending order.
Research, based on non-trained elements, shows that the most recognizable element (i.e. the highest accuracy) is the exclusive gateway, which is characterized by 'x' sign, which does not occur in any other investigated BPMN elements. The next BPMN element with the most successful recognition is the 'timer event'. Since we included only one category of that element in the process (i.e. only the start timer event), the results have mostly proven to be correct. However, when compared to the message element, where we included the representation of the element in three different categories (i.e. start message event, non-interrupting start message event, and intermediate message event), the results proved to be poorly recognized. Only the non-interrupting start message event demonstrated a higher percentage due to its dotted line, making it more recognizable than the rest. The 'human task' element is the third in the order and has the highest score in the category of the tasks. The only two elements that comprise a square frame are the investigated gateways, where also the complex gateway obtained a high recognition score of 87.6%. Due to numerous BPMN events that were investigated, the results turned out to be somewhat lower in that category, namely, the elements: error, message, condition, escalation obtained results with less than 50% accuracy.
These results may be compared to the related work of Deufemia et al. [27] in which the authors investigated the accuracy of the recognition of the main UML class diagram elements reaching a recognition rate of 90% when applying a multilayer ML algorithm. The recognition rate by shapes ranged between 65% for the association element and 99% for the class element. In our case, the best recognition rate was achieved in the case of the Exclusive gateway element at 95%, whereas the lowest was in the case of a manual task at 16,8%. However, it has to be stressed that BPMN vocabulary is more complex when compared to UML, including many smallsize and complex icons, which specialize generic BPMN concepts (as in the case of tasks) and are especially difficult to hand-draw.
In line with RQ1.1, the following table (Table 4) presents the less accurately recognized elements from Table 3 with false positives. Table 4 reveals that the less accurately recognized handdrawn BPMN elements were most commonly substituted with the elements of the same BPMN group (e.g. events, activities, gateways), which share the same shape (as an example, the events were wrongly substituted with other types of BPMN events).
As also evident from Table 3, BPMN elements were recognized on different levels of accuracy, ranging from 16.8% to 95%, where the individual images were recognized on a scale of 0% to 100%. The following table (Table 5) represents examples of hand-drawn BPMN elements, that were recognized in a specific quartile of accuracy (due to less frequent results, the lowest two quartiles were joint together).  A visual analysis of elements in Table 5 reveals that in the cases of broken lines as well in the cases of intersections between lines, the accuracy results tend to be lower.

B. IMPACT OF MODELLING EXPERTISE
To test whether previous expertise in BPMN (measured with the number of produced BPMN diagrams) impacts the modeling time and the accuracy of OCR (RQ2), we performed an independent samples t-test (Table 6), in which we compared subject which self-reported expertise in BPMN (i.e. the number of created BPMN diagrams > 0) with those who have not self-reported any expertise on BPMN (i.e. the number of created BPMN diagrams = 0).
A t-test was conducted to determine whether the accuracy and modeling time of subjects with BPMN expertise (M = 0.56; SD = 0.18 for accuracy and M=709; SD=168 for modeling time in seconds) was significantly different from the subjects who did not report any BPMN expertise (M = 0.59; SD = 0.16 for accuracy and M=810; SD=81 for modeling time in seconds). The t statistic was not significant with respect to accuracy t(70)= −0.22 at p=0.83 as well for modeling time t(70)= −1.32 at p=0.22 (Table 6).

C. IMPACT OF HANDWRITING ACCURACY
To find any insights if the accuracy of letters recognition has any effect on the accuracy of BPMN elements recognition (RQ3), we compared the accuracy of BPMN elements compared to the accuracy of letters of individual subjects (Table 7).
An analysis of Table 7 reveals that there is no correlation between the accuracy of OCR of BPMN elements when compared to the accuracy of OCR of letters.
A 'Pearson r' correlation was applied to examine the relationship between OCR accuracy of BPMN elements (M=0.71, SD=0.18) and OCR accuracy of letters (M = 0.6, SD = 0.51). A non-significant positive correlation was obtained, r = 0.76, p > 0.01, indicating that there is no correlation between the same measure (OCR accuracy) of the investigated groups of elements (BPMN elements, letters).

D. SCANNED VERSUS PHOTO
According to the last research question (RQ4), we tested if there is a difference in recognition of the elements recorded with the camera of the phone and its scanned version, wherein in both cases, no additional image processing was performed. FIGURE 12 depicts sample elements, acquired with a scanner and a camera. A t-test was conducted to determine whether the accuracy of OCR of BPMN elements, acquired with a scanner (M = 0.67; SD = 0.45) was significantly different from the accuracy of OCR of BPMN elements, acquired with a camera (M = 0.28; SD = 0.40). The t statistic was significant, t (70) = 2.14 at p = 0.045 (2 tailed, equal variance assumed), indicating that scanning of BPMN elements yields better accuracy as acquiring them with a mobile phone camera.
According to FIGURE 12, we may assume that the background of an image significantly impacts the accuracy of OCR (preferring white backgrounds). However, this assumption has limitations since we trained the model with scanned images only.

VI. CONCLUSION
In this paper, we presented the results of our application development and research, both related to the optical character recognition (OCR) of BPMN elements. As stated in the introduction chapter of the paper, and the 'state-of-theart' section, no dedicated or trained solutions for the optical character recognition of BPMN elements have been identified even though modelers still use paper-based modeling approaches, especially in the early phases of BPM lifecycle activities.
In line with the goal of the research, we designed a solution for OCR of handwritten BPMN elements by adapting (i.e. training) a generic machine-learning based solution, namely TensorFlow. The results of applying the designed solution on a set of handwritten BPMN elements were afterward evaluated and compared to related work. We demonstrated that it is possible to train an existing solution for OCR of BPMN elements, to recognize hand-drawn BPMN symbols in a similar way as the characters of an alphabet. The chosen TensorFlow machine learning framework was demonstrated to be suitable for creating a specific model for recognizing different categories of images. We successfully re-trained the network and demonstrated that it is attainable to design a model that recognizes BPMN elements. Since TensorFlow is rather a novel technology, we had some challenges in implementing and training of the model with the use of JavaScript technology. Nevertheless, the variety of TensorFlow-based development approaches is almost endless, whereas out of them network-based re-training is just one of the possibilities of TensorFlow.
The obtained results are not directly comparable to the outcomes of the related research since they differ in the applied research approaches. In our approach, we investigated already pre-segmented BPMN symbols, which were collected via questionnaires. Nevertheless, some general parallels between the results of our work and related research can be made. In general, the average accuracy rate (61%) of all investigated BPMN elements is lower compared to similar research [26], [27], [28], [30], [31], [34], [36], [37], [38]. In the related research, the lowest accuracy rate was identified in research [27], where the system correctly identified 79% of the UML symbols. Slightly more encouraging are the results of the comparison of the accuracy rate of individual BPMN elements with the outcomes of existing research. For example, only two studies [37], [38] report a higher accuracy rate than the accuracy rate for recognizing the BPMN element 'Exclusive gateway', which achieved a recognition accuracy of 95%. The recognition rates of BPMN elements 'Non-interrupting start time event' and 'User task' are slightly lower but comparable with results presented in [26], [34], [35], [36], while the accuracy rate for recognition of element 'Complex gateway' is comparable to the results presented in research [30] and [31]. Unfortunately, the results of the remaining investigated BPMN elements indicated a lower accuracy rate (< 76%) when compared to similar studies.

A. IMPLICATIONS
As stated in the introduction, a more efficient alternative to remodeling of hand-drawn diagrams may be to apply optical character recognition (OCR) of (hand-drawn) diagrams in a similar way to how paper-based documents are transformed into digital ones. In light of this, we demonstrated that existing technologies (e.g. TensorFlow) might be trained or adapted for recognizing BPMN elements. These insights may be taken over by vendors, which could include functionality into their modeling tools. An enterprise-ready solution for OCR of process diagrams would speed up the modeling process and let modelers freely chose the preferred modeling approach, e.g. hand-drawn modeling in early phases and the use of modeling tools when the specification of operational models is required.
Secondly, since different BPMN elements resulted in different levels of accuracy, these insights may also be considered when specifying or adapting the visual vocabularies of notations assuring appropriate visual distances between depictions of individual elements. Besides, our findings demonstrate that the 'quality' of handwriting, as well as expertise in the investigated notation, does not gain and benefits concerning the accuracy of optically recognizing corresponding visual elements.
And nevertheless, our research approach as well as the results may motivate researchers in other domains (e.g. software engineering diagrams, business diagrams, electrical circuits diagrams, etc.) to perform similar research.

B. LIMITATIONS AND FUTURE WORK
The interpretation of the findings of our research has to consider the following limitations. Firstly, we based the training model of the machine learning process on the hand-drawn images of BPMN elements, which were obtained from the students, non-experienced in the notation. A different profile of subjects could impact the resulting hand-drawn images. Besides, the images were hand-drawn in a dedicated research instrument (see FIGURE 5), so they might differ from elements' depictions when applied in actual process diagrams. While we focused on individual elements rather than on the diagrams (i.e. visual sentences), the segmentation process was applied before performing the actual machine learning -the input to the OCR process were images of individual symbols.
And nevertheless, although we were able to collect 6336 hand-drawn BPMN symbols from 72 subjects, the final sample set is still rather small for machine learning purposes, especially if considering classes of individual elements. And nevertheless, our research investigated the accuracy of individual BPMN elements, whereas, in reality, they always appear within diagrams.
In our future work, we aim to address some of the above-stated limitations as follows. We plan to increase the diversity of data available for training models by applying data augmentation, a strategy that enables practitioners to significantly increase the diversity of data available, without actually collecting new data. While our research investigated only the visual vocabulary part of BPMN notation, we also aim to investigate the impacts of visual grammar, which represents an essential part of visual sentences, i.e. BPMN diagrams. Besides, to improve the machine learning-based learning model, we aim to generate and search for additional sources of hand-drawn BPMN elements. One alternative is to use the diagrams, which are produced by students during their curricula. And finally, since different notations are in use in the information systems domain, we aim to investigate the accuracy of the optical recognition of other notations, e.g. UML and entity-relationships modeling.
GREGOR POLANČIČ received the Ph.D. degree in software engineering and information systems, in 2008. He is currently an Associate Professor of informatics with the University of Maribor. He has been almost two decades of experience in BPM, starting to investigate BPMN since its first version in 2004. In 2008, he was one of the first authors who published an article, dedicated to the experiences and practical use of BPMN. The article was published in the ''BPM and Workflow Handbook'' in association with the Workflow Management Coalition (WfMC). Overall, his bibliography comprises 300 works, out of them more than 30 journal articles. He has also been teaching BPM since 2005 in several undergraduate and postgraduate courses and has been an invited BPM/BPMN lecturer on several Universities and in global companies. In 2019, he has been a Visiting Professor with Vienna University of economics and business, lecturing Business process implementation course. He has also participated in several BPMN related local and international workshops and projects and is also researching BPM from different technological and user aspects. In recent years, he has been consulting and periodically authoring for Orbus Software and Good e-learning companies, located in the U.K. In 2019, he got BPM Best CEE Forum Paper Award.
SLAVICA JAGEČIĆ received the master's degree in IT from the University of Maribor, Slovenia, in 2019. She is currently a Software Developer, employed with Blocksi Inc., a global company dedicated to providing the Education K12 market, and an innovative cloud content filtering and classroom screen monitoring system with Big Data content analysis. Her research interests include Javascript-based solutions, business process modeling, and machine learning.
KATJA KOUS received the Ph.D. degree in computer science from the University of Maribor, Slovenia, in 2016. She is currently a Teaching Assistant with the Faculty of Electrical Engineering and Computer Science, Institute of Informatics, University of Maribor. She started with an investigation of practical usage of BPMN on undergraduate study and continues with this on postgraduate study. She has more than ten years of experience in the teaching of practical work at the BPMN as a teaching assistant of undergraduate and postgraduate courses (e.g., business process modeling, IT governance, and standards and quality). She was also involved in many industrial projects related to BPMN. VOLUME 8, 2020