CAPTCHA Recognition Using Deep Learning with Attached Binary Images

: Websites can increase their security and prevent harmful Internet attacks by providing CAPTCHA veriﬁcation for determining whether end-user is a human or a robot. Text-based CAPTCHA is the most common and designed to be easily recognized by humans and difﬁcult to identify by machines or robots. However, with the dramatic advancements in deep learning, it becomes much easier to build convolutional neural network (CNN) models that can efﬁciently recognize text-based CAPTCHAs. In this study, we introduce an efﬁcient CNN model that uses attached binary images to recognize CAPTCHAs. By making a speciﬁc number of copies of the input CAPTCHA image equal to the number of characters in that input CAPTCHA image and attaching distinct binary images to each copy, we build a new CNN model that can recognize CAPTCHAs effectively. The model has a simple structure and small storage size and does not require the segmentation of CAPTCHAs into individual characters. After training and testing the proposed CAPTCHA recognition CNN model, the achieved experimental results reveal the strength of the model in CAPTCHA character recognition.


Introduction
The Completely Automated Public Turing test to tell Computers and Human Apart (CAPTCHA) [1][2][3][4] is a type of test to differentiate between humans and computer programs on Internet websites. CAPTCHA attempts to provide security against bots and can appear in many forms, including text, image, audio and video. Conducting research on recognizing CAPTCHA images is important because it helps identify weak points and loopholes in the generated CAPTCHAs and consequently leads to the avoidance of these loopholes in newly designed CAPTCHA-generating systems, thus boosting the security of the Internet.
Text-based CAPTCHAs are still a much popular and powerful tool against malicious computer program attacks due to their extensive usability and easy implementation. The majority of text-based CAPTCHAs consist of English uppercase letters (A to Z), English lowercase letters (a to z), and numerals (0 to 9). Other new large character sets, such as Chinese characters, have been used will change while the architecture of the recognition CNN is unchanged. Furthermore, the simplified architecture of our model considerably reduces the model's storage size. The storage size of the model remains small, unchanged, and unaffected even when the number of characters on CAPTCHA is increased. This is not the case with other segmentation-free models whose internal architecture needs to be modified when the number of characters in the CAPTCHA image increases; in this case, the models also become increasingly complex, and their storage size increases accordingly. We evaluate our proposed CAPTCHA recognition algorithm on two schemes of datasets, namely Weibo and Gregwar. Weibo CAPTCHAs are collected manually from the famous Chinese Weibo social media website while Gregwar CAPTCHAs are generated from the free and strong Gregwar CAPTCHA generating library. We analyzed the security of the two schemes. All targeted CAPTCHAs were broken with high success rates without using any segmentation. We achieved a success rate of 92.68% and 54.2% on Weibo and Gregwar CAPTCHA schemes, respectively. Moreover, the proposed algorithm could be useful for researchers on different fields that may have similar dataset structure.The main contributions of this paper include: • We propose an Attached Binary Images (ABI) algorithm that is used to recognize CAPTCHA characters without the need for segmenting CAPTCHA into individual characters.

•
With the adoption of ABI algorithm, we significantly reduce the storage size of our model and simplify the entire architecture of CNN model.

•
We conduct our experiments on two CAPTCHA dataset schemes. The proposed model has efficiently improved the recognition accuracy and simplified the structure of the CAPTCHA recognition system as compared to other competitive CAPTCHA recognition methods.
The remainder of this paper is organized as follows. Section 2 introduces several CAPTCHA recognition and segmentation methods and algorithms. Section 3 presents the basic idea of the proposed CAPTCHA recognition algorithm and describes the structure and parameters of the recognition CNN. Section 4 shows the structure of the adopted datasets, evaluates the accuracy of proposed CAPTCHA recognition model, provides a comparison of the results, and presents a discussion of the proposed CAPTCHA recognition algorithm. The conclusion of the study is shown in Section 5.

Related Work
Segmentation-based CAPTCHA recognition systems are still widely used for CAPTCHA breaking purposes. The segmentation step is the main component in the recognition process of these segmentation-based models. Several algorithms have been proposed to segment text-based CAPTCHAs into separate characters. Zhang et al. [19] used the vertical projection technique [20][21][22] for CAPTCHA segmentation. They improved the vertical projection to deal with conglutination characters by combining the size features of characters and their locations with the vertical projection histogram. They also covered the segmentation of different types of conglutination. Chellapilla and Simard [23] used the connected component algorithm [24,25] to segment several CAPTCHA schemes, including Yahoo and Google, and achieved a success rate between 4.89% and 66.2%. However, vertical projection and connected component algorithms involve numerous preprocessing operations that are computationally expensive and time consuming. Hussain et al. [26] presented another CAPTCHA segmentation method in which the segmentation is based on recognition. First, an artificial neural network (ANN) was trained to recognize manually cropped CAPTCHA characters, then this trained ANN was used to segment a CAPTCHA image and crop its characters by using sliding windows. This segmentation method involves the application of the trained ANN to many extracted sub-windows to obtain their percentage of confidence, which could increase the segmentation time.
The character recognition module is also considered a crucial component of segmentation-based CAPTCHA recognition systems because it can influence the recognition accuracy of these systems. Sakkatos et al. [27] used the template matching [28,29] approach to recognize characters by comparing the separate characters with template characters using character's coefficient values. Errors arising from similarities between characters are considered a weak point in recognition via template matching unless more advanced solutions are incorporated. Chen et al. [30] introduced a CAPTCHA character recognition method called selective learning confusion class (SLCC). SLCC uses a two-stage Deep Convolutional Neural Network (DCNN) frame to recognize CAPTCHA characters. First, the characters are classified using the all-class DCNN. Then, a confusion relation matrix and a set partition algorithm are used to construct confusion class subsets. This CAPTCHA character recognition method has high character recognition accuracy, especially for confusion-class characters; however, assigning a new DCNN to each confusion class subset could considerably increase the storage size of the whole system.
To avoid the drawbacks of ineffective CAPTCHA segmentation algorithms, researchers have recently begun to adopt deep-learning-based segmentation-free CAPTCHA recognition systems for recognizing CAPTCHAs directly without segmentation. The authors of [31,32] used a segmentation-free CAPTCHA recognition CNN that is trained to recognize all CAPTCHA characters simultaneously. A specific number of neurons (equal to character classes) in the output layer was assigned to each CAPTCHA character for classification. This recognition model has fast recognition speed and avoids CAPTCHA segmentation. However, as the number of CAPTCHA characters increases, the number of neurons in the output layer also increases consequently, the storage size increases as well. Another segmentation-free multi-label CNN model was presented by Qing et al. [33] to recognize CAPTCHAs with connected and distorted characters.The internal structure of this CNN model was designed to consider the correlation between adjacent characters to improve recognition accuracy. However, this model uses a separate set of convolutional and fully connected layers for each character on CAPTCHA, which greatly complicates the architecture and increases the storage size of the model when the number of CAPTCHA characters is increased.
The authors in [34,35] used a segmentation-free model that combines CNN and attention-based recurrent neural network (RNN) to accomplish CAPTCHA recognition. The CNN part extracts features from a CAPTCHA image and produces feature vectors, and an Long Short-Term Memory (LSTM) network transforms the feature vectors into a text sequence. This model has high recognition speed and can be used to recognize CAPTCHAs of various lengths. However, the model's architecture is relatively complex because it consists of CNN and RNN parts, which could result in increased storage size.

Proposed Method
In this section, we describe the basic idea of our proposed CAPTCHA recognition algorithm. Also we explain the characteristics of the adopted attached binary images. Then, we present the internal structure of the CAPTCHA recognition CNN and its training parameters.

Basic Concept of Proposed Recognition Approach
For simplifiation, we use the term CRABI which stands for CAPTCHA Recognition with Attached Binary Images to refer to our proposed CAPTCHA recognition algorithm. The main idea behind the proposed method is to make several copies of the input CAPTCHA image and then attach external distinct binary images to these copies. We refer to the binary images as attached binary images (ABIs). We use the resultant CAPTCHA copies to train a CNN model to recognize CAPTCHA characters. We refer to this proposed CAPTCHA recognition CNN as "CRABI-CNN" throughout this paper. A description of the proposed CRABI algorithm is shown in Figure 2. We begin by explaining the recognition process during the training phase. The testing phase is then clarified.

Training Phase
Suppose that the original training set consists of M text-based CAPTCHA images, and the number of characters in each CAPTCHA image is n. The proposed algorithm is explained as follows: 1. Making Copies: We make n copies of each CAPTCHA image in the original training set. We end up with n identical copies of the training set. 2. Preparing ABIs: We have to define n external distinct fixed-size binary (black and white) images.
These distinct binary images are used to represent the position or location information of the CAPTCHA characters. Each of the n distinct binary images is always responsible for locating exactly one character of the n CAPTCHA characters. That is, the first binary image will be always responsible for locating the first character in each CAPTCHA image, the second binary image is always responsible for locating the second character in each CAPTCHA image, and so on. The specific design of the binary images is presented in Section 3.2. 3. Attaching ABIs: We attach the distinct binary images to the CAPTCHA copies. The first distinct binary image is attached to the first copy of each CAPTCHA image in the training set, the second distinct binary image is attached to the second copy of each CAPTCHA image in the training set, and so on. We end up with a new training set consisting of n × M images. We refer to this new dataset as the "resultant dataset", and each image in this dataset is referred to as the "resultant CAPTCHA copy". Each resultant CAPTCHA copy consists of a CAPTCHA image and its ABI. 4. Labeling: We add labels to each resultant CAPTCHA copy in the resultant dataset. Every resultant CAPTCHA copy is given only one character class to be its label. Every resultant CAPTCHA copy consists of a CAPTCHA image and an ABI. The ABI of each resultant CAPTCHA copy determines the location of the character class on the CAPTCHA copy that is added as a label. In this way, labels can be added directly to all resultant CAPTCHA copies on the resultant dataset. 5. Training: The resultant CAPTCHA copies and their labels are used to train a CNN model to classify and recognize CAPTCHA characters. This CNN is trained to use attached binary images for locating CAPTCHA characters, and labels for recognizing character classes. Figure 2 provides a full description of the proposed algorithm. The steps mentioned above can be summarized as follows: make n copies of the entire original training set; create n external distinct binary images; attach the first distinct binary image to all images of the first copy of the entire training set, attach the second distinct binary image to all images of the second copy of the entire training set, and so on. Attach labels to each resultant CAPTCHA copy; and use the resultant dataset to train a CNN network for classifying CAPTCHA characters. A complete framework description of the whole pipeline in training phase is shown in Figure 3.

Testing Phase
To test CAPTCHA images, we perform the following: 1. We make n copies of the CAPTCHA input image. 2. We attach each one of the n distinct binary images to one of the n copies of the CAPTCHA input image. 3. We submit the resultant n CAPTCHA copies directly to the trained CAPTCHA recognition CNN. 4. The CNN locates and classifies the characters of each resultant CAPTCHA copy, and the desired output is obtained. A complete framework description of the whole pipeline in testing phase is shown in Figure 4.
We use four-character CAPTCHA images of the shape 96 × 280 to implement our CAPTCHA recognition system. First, we make four copies of each CAPTCHA image. Second, we prepare four distinct binary images of the shape 96 × 40. Third, we attach the first binary image to the left of the first copy of each CAPTCHA image, the second binary image to the left of the second copy of each CAPTCHA image, and so on. The resultant CAPTCHA copies acquire the shape 96 × 320. Lastly, labels are given to each resultant CAPTCHA copy. That is, the resultant CAPTCHA copy which its ABI is responsible for locating the first character is given its first character's class as its label, and the resultant CAPTCHA copy whose ABI is responsible for locating the second character is given its second character's class as its label. The same procedure is repeated for all resultant CAPTCHA copies. We now have four resultant CAPTCHA copies for each CAPTCHA input image. Each resultant CAPTCHA copy contains an ABI that represents the location (or position) of the CAPTCHA character and a label that represents the character's class. These resultant CAPTCHA copies are then used to train our CRABI-CNN model to recognize and classify label characters. In this way, we have avoided the need to segment CAPTCHA images into isolated characters and have simplified the entire structure of the CAPTCHA recognition CNN model.
Our CRABI-CNN, to some extent, is similar to the character recognition CNNs that are usually used in the character recognition part of segmentation-based CAPTCHA recognition systems. However, in character recognition CNNs, CAPTCHA input images are required to be segmented first; then, the separate characters are submitted individually to the character recognition CNN. By contrast, in our proposed CRABI algorithm, we do not need to segment CAPTCHA images because we replaced the segmentation step by making copies of CAPTCHA images and attaching binary images to these copies. The ABIs help in locating the characters to be recognized, and the resultant CAPTCHA copies are submitted individually to the CRABI-CNN for character recognition.

Characteristics of Attached Binary Images Adopted in Our Captcha Recognition Model
Before starting to explain our adopted ABIs design, we need to mention that the design of ABI is not unique and can differ from researcher to another, and any researcher can use a design that is suitable for his or her application. For example, ABIs can just be binary images for numbers from 1 to n that distinguish the n character locations of a CAPTCHA image.
In our case, an ABI is a binary image (an image that consists only of black pixels with a value of 0 and white pixels with a value of 255) whose number of rows r is similar to the number of rows of the input CAPTCHA image, which is 96 in our case. The number of columns c of an ABI is given by the following formula: where n is the number of characters of the input CAPTCHA and w is the range of white columns in ABI. In our case, the number of characters n is equal to 4, and the range of white columns w is set to be 10. Thus, the number of columns c of an ABI is 40, and its shape is 96 × 40. Given that we have four characters in the input CAPTCHA image, we need to make four ABIs (one for each character). The characters in the CAPTCHA image are represented by x {Numeral digits or English letter}, and each character is given an index i {1, 2, 3, 4} representing its location (or order) in the CAPTCHA image, such that the first character of the CAPTCHA image is denoted by x 1 , the second character is denoted by x 2 , and so on. Each character x i is attached to its corresponding ABI. The white and black regions of ABIs can be calculated with: where c B(i) is the column's number at which the white columns range begins and c F(i) is the column's number at which the white columns range ends. For example, for the third CAPTCHA character x 3 , c B(3) = 21 and c F(3) = 30, which means that columns 21 to 30 are white, whereas the remaining columns (from 1 to 20 and 31 to 40) are black. The ABIs are depicted in Figure 2.

Structure and Parameters of the Proposed CRABI-CNN
The feature extraction part of the proposed CRABI-CNN is based on the Model-5 architecture introduced in [36]. The CRABI-CNN architecture consists of 17 convolutional layers, 5 maxpooling layers, 1 flatten layer, 1 dropout layer, and 1 output softmax layer and their parameters are set to accelerate the training process and improve features extraction. A full description of CRABI-CNN architecture is shown in Table 1, and the entire architecture of the CRABI-CNN model is depicted in Figure 5. The output softmax layer contains several neurons equal to the number of character classes. For example, if a CAPTCHA scheme uses 62 character classes (10 numeral digits, 26 uppercase English letter and 26 lowercase English letter) to represent its characters, then the output softmax layer of our CRABI-CNN will contain only 62 neurons, such that each neuron corresponds to exactly one character class.  In the training of our proposed CAPTCHA recognition CNN, the cross-entropy loss function is used to measure differences between predicted and true classes. The Adam optimizer is adopted to optimize loss function. The learning rate is set to 0.00001 to provide improved optimization by experimentation. The training batch size is 128 images per batch, and the training process is for 120 epochs.

Experiments and Results
In this section, we explain two CAPTCHA scheme datasets used to train, validate, and test CRABI-CNN and clarify the labeling process. Next, the accuracy of the CRABI algorithm is calculated. Then, the results of comparing the CRABI algorithm with other CAPTCHA recognition systems are presented, and a discussion of the advantages and shortcomings of the CRABI algorithm is provided. Notably, we have trained, validated, and tested our CRABI model in the following environment: Floydhub cloud server, Tesla K80 GPU with 12 GB memory, 61 GB RAM, 100 GB SSD, Cuda v9.1.85, CuDNN 7.1.2, TensorFlow 1.9.0, and Keras 2.2.0 on Python 3.6.

Used Dataset and Labeling Description
Given that no publicly available standard datasets of CAPTCHA images can be used for CAPTCHA recognition purposes, we need to obtain CAPTCHA images either by collecting them from real online websites or by generating them using CAPTCHA generation software. To conduct our experiments, we adopt two CAPTCHA dataset schemes: Weibo (https://www.weibo.com/) and Gregwar (https://packagist.org/packages/gregwar/captcha). Figure 6 shows samples of the two CAPTCHA schemes.
Weibo CAPTCHA scheme samples Gregwar CAPTCHA scheme samples

Weibo Captcha Scheme
Weibo is one of the largest Chinese social media platforms. It is among the most popular websites globally as ranked by Alexa. In 2018, Weibo's monthly active users exceeded 400 million. The Weibo CAPTCHA scheme uses resistance mechanisms, including distortion, character overlapping, rotation and warping. Its CAPTCHAs contain four characters with character classes of either numeral digits or uppercase English letters. The excluded characters are 0, 1, 5, D, G, I, Q, U [7]. We manually collect and label 70,000 random Weibo CAPTCHA images as a dataset.

Gregwar Captcha Scheme
Gregwar is a free and open-source CAPTCHA generating library in PHP. It is among the strongest CAPTCHA schemes that show effective resistance against CAPTCHA breaking bots. It incorporates several security mechanisms, such as dense noise lines, color background, and rotation. We generate CAPTCHAs of four characters with character classes of either numerical digits, uppercase English letters, or lowercase English letters. We randomly generate 70,000 Gregwar CAPTCHA images as a dataset. All of the four characters in each generated CAPTCHA image are selected randomly, and we have verified that no repeated or duplicated CAPTCHA images are present.
Each CAPTCHA dataset scheme is divided into 50,000 CAPTCHA images as a training set, 10,000 CAPTCHA images as a testing set, and 10,000 CAPTCHA images as a validating set. Each CAPTCHA image in the two dataset schemes contains a label included in the image's name. This label consists of a four-characters text or string that represents the four characters found on this CAPTCHA. First, all CAPTCHA images in the two dataset schemes are converted into grayscale for simplification. Second, the images are reshaped into 96 × 280. Notably, the selection of the training, validating, and testing set images is performed randomly to avoid any subjective effects.
To train our CRABI-CNN, we make four copies of each CAPTCHA image and attach binary images of size 96 × 40 to each copy so that the size of the resultant CAPTCHA copies will become 96 × 320. Then, each resultant CAPTCHA copy is assigned a label consists of only one character.
Since we have made four resultant CAPTCHA copies of each original CAPTCHA image, the resultant CAPTCHA copy with the first ABI is assigned the first character of the four-character text of the original CAPTCHA image as a label, the resultant CAPTCHA copy with the second ABI will be assigned the second character of the 4-characters text of the original CAPTCHA image as a label, and so on. After finishing the labeling process, we end up with new schemes of datasets, each of which consists of 280,000 images of the resultant CAPTCHA copies (4 copies × 70,000 CAPTCHA images) with their corresponding character labels. These resultant dataset schemes are used individually to train, validate, and test our CRABI-CNN. The training set of each resultant dataset scheme consists of 200,000 images (4 copies × 50,000 CAPTCHA images), the testing set consists of 40,000 images (4 copies × 10,000 CAPTCHA images), and the validating set has 40,000 images (4 copies × 10,000 CAPTCHA images).

Accuracy and Training Description
We use the resultant Weibo scheme dataset to train CRABI-CNN for 120 epochs. The accuracy of CRABI-CNN can be evaluated using two criteria: total character recognition accuracy and overall CAPTCHA image accuracy. For total character recognition accuracy, all characters of all CAPTCHA images are classified individually, and accuracy is calculated by dividing the number of correctly recognized characters by the total number of characters of all CAPTCHA images. For overall CAPTCHA image accuracy, all four characters of a single CAPTCHA image must be classified correctly for the CAPTCHA image to be recognized correctly; if one of the four characters of a single CAPTCHA image is wrongly classified, then the recognition result of this CAPTCHA image will be considered false. Afterward, the number of correctly recognized CAPTCHA images is calculated and divided by the total number of CAPTCHA images. Table 2 shows detailed results of the accuracies of the training, validating, and testing sets of Weibo and Gregwar datasets. Figure 7 shows the training and validating total character recognition accuracies of Weibo and Gregwar datasets over 120 epochs of training. After a training period of 120 epochs on the resultant Weibo dataset, we reach a testing total character recognition accuracy of 97.89%, such that 39,156 characters out of the 40,000 resultant testing set characters are recognized correctly. Also we reach an overall CAPTCHA testing accuracy of 92.68% with 9268 CAPTCHAs out of the 10,000 original testing set CAPTCHA images were correctly recognized. After a training period of 120 epochs on the resultant Gregwar dataset, we achieve a testing total character recognition accuracy of 85.28% with 34,111 characters out of the 40,000 resultant testing set characters are recognized correctly. We also obtain an overall CAPTCHA testing accuracy of 54.20%, such that 5420 CAPTCHAs out of the 10,000 original testing set CAPTCHA images are correctly recognized. Table 2 shows that the testing and validating accuracies of the Gregwar dataset are lower than those of the Weibo dataset. These results are expected because the defense mechanisms adopted by each CAPTCHA scheme differ. For instance, the Weibo CAPTCHA scheme contains only 28 character classes and always has a white background without any noise line passing through characters. Meanwhile, the Gergwar CAPTCHA scheme has 62 character classes and contains diverse foreground and background colors in addition to dense noise lines passing through characters. The strong defense mechanisms used by Gregwar CAPTCHA images greatly contribute to reducing the testing and validating accuracies of CRABI-CNN.
The total number of trainable and non-trainable parameters of the CRABI-CNN model is 6,670,812 for Weibo and 7,193,086 for Gregwar schemes. The number of trainable parameters is larger in Gregwar scheme CRABI-CNN because Gregwar scheme has more character classes and consequently more neurons in the output layer of CRABI-CNN. The size of CRABI-CNN weights on the hard disk is 25.5 MB and 27.5 MB for Weibo and Gregwar schemes, respectively.

Comparison Results
To show the strong and weak points of the CRABI-CNN model, we compare it with other common CAPTCHA recognition algorithms on the same datasets. Two of the most currently popular models used for recognizing text and CAPTCHA images are the multilabel CNN and the Convolutional Recurrent Neural Network (CRNN).
The multilabel model that we use for comparison consists of exactly the same layers shown in Table 1, with four softmax layers on the output instead of only one softmax layer. Each of the four softmax layers is responsible for recognizing one corresponding character of the four-character CAPTCHA image. The structure of this multilabel-CNN model is similar to that of model-5 CNN proposed in [36]. The Adam optimizer with a learning rate of 0.00001 is used to optimize the cross-entropy loss functions of this CNN.
We have trained this multilabel-CNN model on the Weibo and Gregwar dataset schemes separately. Each dataset scheme consists of 50,000 CAPTCHA images as a training set, 10,000 CAPTCHA images as a testing set, and 10,000 CAPTCHA images as a validation set. Before training, we convert CAPTCHA images into grayscale and reshape them to 96 × 320. We have trained the multilabel-CNN model with each dataset separately for 120 epochs. The second model, used in our comparison is the CRNN model which is based on the models in [37,38]. In this CRNN model, we use eight convolutional layers, five maxpooling layers, two batch normalization layers, and two bidirectional gated recurrent unit (GRU) layers. All convolutional layers have a kernel size of 3 × 3, except for the last one that has a kernel size of 2 × 2, and the number of filters in these convolutional layers start from 64,128 until 512. All maxpooling layers are of size 2 × 2 and each of the bidirectional GRU layers has 128 units. The input CAPTCHA images are converted to grayscale and reshaped to 64 × 256, and the parameters of the convolutional and maxpooling layers are set to obtain 7 of 512-dimensional feature sequences which are forwarded to the GRU layers. The connectionist temporal classifier (CTC) [39] loss function is used to train this CRNN model.
The  Table 3 shows the results of comparing our CRABI-CNN model with the multilabel-CNN model and the CRNN model. The comparison results in Table 3 indicate that the testing total character recognition accuracy of our proposed CRABI-CNN model is better than that of the multilabel-CNN model in both CAPTCHA schemes. In addition, the testing overall CAPTCHA recognition accuracy of the CRABI model is the highest among the three models in Weibo and Gregwar dataset schemes. There is a difference of about 644 correctly recognized Weibo CAPTCHA images and about 297 correctly recognized Gregwar CAPTCHA images between our CRABI model and the multilabel model, and a difference of about 163 correctly recognized Weibo CAPTCHA images and about 422 correctly recognized Gregwar CAPTCHA images between our CRABI model and the CRNN model. These results show the superiority of the testing accuracy of our CRABI model. Moreover, Table 3 indicates that the number of trainable and non-trainable parameters and the size of weights of the CRABI-CNN model are much smaller than those of multilabel-CNN and CRNN models in both CAPTCHA schemes. This result is expected because the CRABI-CNN model contains only one softmax output layer with a limited number of neurons equal to the number of character classes adopted by a CAPTCHA scheme. However, the multilabel-CNN model contains four softmax output layers, such that each softmax layer contains several neurons equal to the number of character classes adopted by a CAPTCHA scheme. For example, if a four-character CAPTCHA scheme has 62 character classes (10 numeral digits, 26 English uppercase letters, and 26 English lowercase letters), then each softmax output layer in the multilabel-CNN model will contain 62 neurons, and the total number of neurons in the output layer will be 4 × 62 = 248 neurons. This increase in the number of neurons in the output layers of the multilabel model will increase the number of trainable and non-trainable parameters and the size of the multilabel model. Table 3 also shows that the CRNN model has the largest storage size among the three models because it includes the size of CNN and RNN layers. Table 3 indicates that the average time of one training epoch of the CRABI model is longer than that of multilabel and CRNN models in both CAPTCHA schemes. The reason is that the training set used for the CRABI model is the resultant training set, which is four times the original training set since we have made four resultant CAPTCHA copies of each original CAPTCHA image in the original training set. This increase in the dataset caused the training time to increase by almost four times because the number of characters found in CAPTCHA images is four. Moreover, the testing time for the CRABI model is the longest among all models due to the same increase in the testing set of our CRABI model.

Discussion of the Proposed Crabi Algorithm
Any proposed algorithm or approach has its benefits and shortcomings. In this section, we show the advantages of our CRABI algorithm that make it a good choice for CAPTCHA recognition. We also discuss several disadvantages that must be considered.

Captcha Breaking Ability
Our proposed method achieved relatively high success rates for both the targeted schemes as shown in Table 2. CAPTCHA scheme can be considered broken when the success automated attack rate is 1% according to [40]. We have successfully broken several resistance mechanisms found on both CAPTCHA schemes which are commonly adopted by many popular CAPTCHA schemes including distortion, character overlapping, dense noise lines, rotation, warping, and color background. Also it can be noted from the results shown on Table 2 that the defense mechanisms adopted by Gregwar scheme is stronger than those of Weibo scheme. Gregwar CAPTCHA scheme incorporates strong security mechanism such as dense noise lines, diverse foreground and background colors, and wider range of character classes which makes it more difficult for even humans to recognize.

Avoiding Segmentation
One of the advantages of the proposed CRABI algorithms is the avoidance of segmenting CAPTCHA images into individual characters. By using ABIs and attaching them to CAPTCHA copies, CRABI-CNN can be trained to simultaneously locate and recognize characters without segmentation. Segmentation-based CAPTCHA recognition systems usually suffer from inefficient CAPTCHA segmentation techniques that could adversely affect their performance.

Small Storage
CRABI-CNN contains only one softmax layer in the output with the number of neurons equal to the character classes adopted by a CAPTCHA scheme. This feature means that even if the number of characters in the input CAPTCHA image is increased, only the number of ABIs and CAPTCHA copies will increase accordingly, and the CAPTCHA recognition CNN will still have only one output softmax layer with the same number of neurons. As a result, the number of trainable and non-trainable parameters, weights and storage size of CRABI-CNN are fixed and do not increase as the number of characters in CAPTCHA images increases. A different case applies to many other CAPTCHA recognition systems, such as multilabel-CNNs. Multilabel-CNNs contain multiple softmax output layers and many output neurons. The number of softmax output layers and neurons of multilabel-CNNs directly depends on the number of characters in the CAPTCHA images. Consequently, the number of trainable and non-trainable parameters, weights, and storage size of these CNNs will increase as the number of characters in the CAPTCHA images increases.

Simplicity and Flexibility
The structure of CRABI-CNN is (and keeps) simple, flexible, and unchangeable against an increasing or decreasing number of characters in the CAPTCHA image. The internal structure of CRABI-CNN does not need to be modified when the number of CAPTCHA characters is changed; modification is made only on the number of ABIs and CAPTCHA copies to be equal to the new number of characters. However, in the case of multilabel CAPTCHA recognition CNNs, the number of softmax output layers needs to be modified to be equal to the new number of characters in the CAPTCHA image; the number of neurons in the output layers needs to be updated accordingly. This task involves modifying the internal structure of the recognition CNN, thereby increasing its size and complicating its internal structure as the number of characters in the CAPTCHA image increases.

Long Training and Testing Time
The CRABI algorithm still suffers from some shortcomings that must be considered. In CRABI-CNN, the training set used for training is increased by a factor of n, where n is the number of characters in the CAPTCHA image, because we need to make n copies of each input CAPTCHA image in the original training set and attach binary images to each copy. This increase in the training set leads to an increase in the training time of each epoch of CRABI-CNN by approximately a factor of n, thus increasing the training time needed to reach the required accuracy. In addition, the testing set increases in the same manner as the training set. This increment, in turn, will increase the testing time needed by the CRABI model.

Memory Use
The increase in the original training set by a factor of n as required by our CRABI algorithm uses considerable memory (RAM) during the training phase. However, the capabilities of recent hardware resources (e.g., GPUs, RAMs, SSDs) used in deep learning applications have become increasingly powerful and robust, and the increase in the size of datasets is no longer a serious issue.

Conclusions
In this study, we present a new segmentation-free algorithm for recognizing CAPTCHA image. The algorithm is based on deep learning and uses ABIs with copies of the CAPTCHA images to locate and recognize the characters of CAPTCHA image. The adoption of ABIs in the proposed model decreases the overall size of the recognition system and reduces the complexity of the CNN model structure because the number of neurons in the output softmax layer is fixed and independent of the number of characters shown in the CAPTCHA image. Furthermore, the avoidance of the segmentation step by attaching ABIs, the adoption of a strong feature extraction CNN architecture, and the use of ABIs for locating characters of CAPTCHA images significantly contribute in increasing the CAPTCHA characters recognition accuracy.
The proposed algorithm is evaluated on two schemes of datasets with about 70,000 CAPTCHA images in each scheme. The experimental results show that the proposed algorithm has a relatively higher CAPTCHA recognition accuracy than the state-of-the-art multilabel-CNN and CRNN models in both CAPTCHA scheme datasets. The storage size of the proposed model is also much smaller than that of both models. The proposed algorithm can be considered a new fundamental algorithm for recognizing CAPTCHAs in addition to the multilabel-CNN, CRNN, and segmentation-based CAPTCHA recognition models.