EBOC: Ensemble-Based Ordinal Classification in Transportation

Learning the latent patterns of historical data in an efficient way to model the behaviour of a system is a major need for making right decisions. For this purpose, machine learning solution has already begun its promising marks in transportation as well as in many areas such as marketing, finance, education, and health. However, many classification algorithms in the literature assume that the target attribute values in the datasets are unordered, so they lose inherent order between the class values. To overcome the problem, this study proposes a novel ensemble-based ordinal classification (EBOC) approachwhich suggests bagging and boosting (AdaBoost algorithm) methods as a solution for ordinal classification problem in transportation sector. This article also compares the proposed EBOC approach with ordinal class classifier and traditional tree-based classification algorithms (i.e., C4.5 decision tree, RandomTree, and REPTree) in terms of accuracy. The results indicate that the proposed EBOC approach achieves better classification performance than the conventional solutions.


Introduction
Machine learning plays an important role in many prediction problems by constructing a model from explored dataset. The most common task in learning process is classification. Classification is the process of assigning an input item in a collection to predefined classes by discovering relationships among instances in the training set. Classification has a wide range of applications, such as document categorization, medical diagnosis, fraud detection, pattern recognition, sentiment analysis, risk assessment, and signal processing.
Transportation is a sector which focuses on replacement of humans, animals, and stuff from one position to another. Developments in the field of transportation reveal a need to discover associations and obtain complex and nonlinear relations underlying in a vast amount of transportation data. Because of this necessity, in recent years, machine learning techniques, especially classification algorithms, commenced to be used in transportation sector as an interdisciplinary approach. The underlying goals for these solutions are to predict traffic flow [1], classify vehicle images [2], identify different transportation modes [3], analyse traffic incident's severity [4], mitigate unfavourable environmental impacts (i.e., to optimize energy usage [5]), develop autonomous driving system [6], and improve the productivity and efficiency of transportation systems.
The relationship between the class labels in the dataset to which the machine learning algorithms are applied influences the classification performance. In the literature, for the data of which class attribute values involve some order or a sort of ranking system, ordinal classification approach is proposed. Ordinal classification predicts the label of a new ordinal sample by taking ranking relation among the classes into consideration [7]. An example of an ordinal class attribute is one that has the values "large," "medium," and "small" for a size attribute. It is clear that there is an order among those values and that we can write large > medium > small. Although there are many classification studies [2][3][4][5][6] performed in transportation sector, to the best of our knowledge, there has been no prior detailed investigation for ordinal classification in transportation sector. Considering this drawback, the study presented in this article focuses on the application of ordinal classification algorithms on realworld transportation datasets. Some transportation studies contain an ordinal response variable; for example, the injury severity of traffic accidents can be categorized as fatal > serious > slight; similarly, traffic volume can be classified as high > medium > low.
Meanwhile, ensemble learning has been recently preferred in machine learning for the classification task because of the high prediction ability it provides. Ensemble learning is a machine learning technique which combines a set of base learning models to get a single final prediction [8]. These learning models can be any classification algorithms, such as neural network (NN), Naive Bayes (NB), decision tree, support vector machine (SVM), regression, and k-nearest neighbour (KNN). Many studies in the literature have stated that ensemble learners improve prediction performance of individual learning algorithms. There exist various ensemblelearning methods: bagging, boosting, stacking, and voting. In this study, bagging and boosting (AdaBoost algorithm) methods are selected, due to their popularity, for the solution of ordinal classification problem in transportation sector.
The novelty and main contributions of this article are as follows: (i) it provides a brief survey of ordinal classification and ensemble learning, which has been revealed to improve prediction performance of the traditional classification algorithms, (ii) it is the first study in which the ordinal classification methods have been implemented in transportation sector, (iii) it proposes a novel ensemble-based ordinal classification (EBOC) approach for transportation, and (iv) it presents experimental studies conducted on twelve different real-world transportation datasets to demonstrate that the proposed EBOC approach shows better classification results than both ordinal class classifier and traditional treebased classification algorithms in terms of accuracy.
The remainder of this article is structured as follows: in the following section, related literature and previous works on the subject are summarized. Section 3 gives background information about ordinal classification and ensemble learning. This section also explains utilized tree-based algorithms such as C4.5 decision tree, RandomTree, and REPTree in detail. In Section 4, the proposed ensemble-based ordinal classification (EBOC) approach for transportation sector is defined. Section 5 gives the description of transportation datasets used in this study. The application of traditional algorithms and proposed method on the transportation datasets and the experimental results of them with discussions are also presented in this section. Moreover, all obtained results were validated by three statistical methods to ensure the significance of differences among the classifiers on the datasets, including multiple comparisons (Friedman test and Quade test) and pairwise comparisons (Wilcoxon signed rank test). Finally, the last section gives some concluding remarks and future directions.

Related Work
In machine learning, there are two main types of tasks: supervised learning and unsupervised learning. The classification process, which is one of the supervised learning techniques, is divided into two categories: nominal classification (where no order is assumed between the classes) and ordinal classification (where the ordinal relationship between different class labels should be taken into account). The difference between ordinal and nominal classification is not remarkable in the case of binary classification, owing to the fact that there is always an implicit order in "positive class" and "negative class." In multiclass classification problems, standard classification algorithms for nominal classes can be applied to ordinal prediction problems by discarding the ordering information in the class attribute. However, this approach does not take advantage of the inner structure of the data and some information that can potentially improve the predictive performance of a classifier is lost since it ignores the existing natural order of the classes. The literature presents three different approaches for the ordinal classification problem: binary classification, regression methods, and specific techniques. The ordinal classification paradigm is summarized in the graph in Figure 1. In this article, a novel ordinal classification study was performed for the transportation sector.
The first approach for the ordinal classification process is to convert an ordinal classification problem into several binary classification problems. In this type of studies [9,10], two-class classification algorithms are applied on ordinal valued datasets after transforming class problem into a set of k-1 binary subproblems. For example, in the study presented in [10], the researchers proposed a novel approach which reduces the problem of classifying ordered classes to standard two-class problem. They introduced a data replication method which is then mapped into neural networks and support vector machines. In the experiments, they applied the proposed method on both artificial and real datasets for gene expression analysis. Li and Lin [11] developed a reduction framework for ordinal classification that consists of three steps: (i) extracting extended examples from training examples by using weights, (ii) training a classifier on the extended examples with any binary classification algorithm, and (iii) constructing a ranking rule from the binary classifier.
As a second approach, regression methods [12,13] can be used to deal with ordinal classification problem, since regression models have been thought for continuous data. In this method, categorical ordinal data is converted into continuous data scale and then regression algorithms are applied on this transformed data as a postprocessing step. However, a disadvantage of this approach is that the natural order of the class values is discarded and the inner structure of the samples is lost. In [12], two most commonly used ordinal logistic models were applied on medical ordinal data: proportional odds (PO) form of an ordinal logistic model and the forward continuation ratio (CR) ordinal logistic model. Rennie and Srebro [13] applied two general thresholdbased constructions (the logistic and hinge loss) on the one million MovieLens dataset. The experimental results stated that their proposed approach shows more accurate results than traditional classification and regression models.
In the last approach, problem-specific techniques [14][15][16][17][18] were developed for ordinal data classification by modifying present classification algorithms. The main advantage of this approach is to retain the order among the class labels. However, some of them present some complexities in terms of implementation and training. These are complex and require nontrivial changes in the training methods such as modification of the objective function or using a threshold-based model. Keith and Meneses [14] proposed a novel technique called Barycentric Coordinates for Ordinal Classification (BCOC) which uses barycentric coordinates to represent ordinal classes geometrically. They applied their proposed method on the field of sentiment analysis and presented effective results for complex datasets. Researchers in another study [17] presented a novel heuristic rule learning approach with monotonicity constraints including two novel justifiability measures for ordinal classification. The experiments were performed to test the proposed approach and the results indicated that the novel method showed high prediction performance by guaranteeing monotone classification with low rule set increase.
Under favour of its high prediction performance, ensemble-learning techniques commenced to be preferred in ordinal classification [19][20][21][22] as well as nominal classification problems. Hechenbichler and Schliep [20] proposed an extended weighted k-nearest neighbor (wkNN) method for the ordinal class structure. In their study, weighted majority vote mechanism was used for the aggregation process. In the other study [21], an enhanced ensemble of support vector machines method was developed for ordinal regression. The proposed approach was implemented on the benchmark synthetic datasets and was compared with a kernel based ranking method in the experiments. Lin [23] introduced a novel threshold ensemble model and developed a reduction framework to reduce ordinal ranking to weighted binary classification by extending SVM and AdaBoost algorithms. The results of all these studies show that the ensemble methods perform well on the datasets and provide better performance than the individual methods.
Differently from existing studies, our work proposes a novel ensemble-based ordinal classification approach including bagging and boosting (AdaBoost algorithm) methods. Also, the present study is the first study in which an ordinal classification paradigm is implemented on real-world transportation datasets to model the behaviour of transportation systems.

Background Information
In this section, background information about ordinal classification, classification algorithms, and ensemble learning is presented to provide the context for this research.

Ordinal Classification.
In most of classification problems, target attribute values which will be predicted usually assumed that they have no ordering relation between them. However, the class labels of some datasets have inherent order. For example, when predicting the price of an object, ordered class labels such as "expensive," "normal," and "cheap" can be used. The order among these class labels is clearly understood and denoted as "expensive" > "normal" > "cheap." Because of this reason, classification techniques differ according to nominal and ordinal data types.
Nominal Data. The data which have no quantitative value is named nominal data. To label variables, nominal scales are utilized such as vehicle types (i.e., car, train, and bus) or document topics (i.e., science, business, and sports).
Ordinal Data. In ordinal data, the values have a natural order. In this data type, the order of values is significant such as data obtained from the use of a Likert scale.
Ordinal classification which is proposed for the prediction of ordinal target values is one of the most important classification problems in machine learning. This paradigm aims to predict the unknown class values of an attribute that have a natural order. In the ordinal classification problem, the ordinal dataset = {( 1 , 1 ), ( 2 , 2 ), . . . ., ( , )} has a set of items with input feature space and class attribute = { 1 , 2 , . . . ., } has class labels with an order > −1 > . . . . > 1 , where > denotes the relation of ordering. In other words, an example (x, y) is composed of an input vector x X and an ordinal label (i.e., rank) = {1, 2, . . . , }. The problem is to predict the example (x, y) as rank k, where k = 1,2,. . ., K.
In a decision tree-based ordinal classification, data is transformed from a k-class ordinal problem to k−1 binary class problems that encode the ordering of the class labels, as the first step. The ordinal dataset with classes 1 , 2 ,. . .., is converted into binary datasets by discriminating 1 ,. . ., against +1 ,. . .., that represents the test > . In other words, the upward unions of classes are considered progressively in each stage of binary datasets construction. Then a standard tree-based learning algorithm is employed on the derived binary datasets to construct k-1 models. As a result, each model predicts the cumulative probability of an instance of belonging to a certain class. To predict the class label of an unseen instance, the probability for each ordinal class P( ) is estimated by using k-1 models. The estimation of the probability for the first ordinal class label depends on a single classifier 1-P(target > 1 ). The probability of the last ordinal class is given by P(target > −1 ). In the middle of the range, the probability is calculated by a pair of classifiers P(target > −1 ) -P(target > ), where 1 < i < k. Finally, we choose the class label with the highest probability.
The core ideas of using decision tree algorithms for ordinal classification problems can be described in three folds.
First, Frank and Hall [7] reported that decision treebased ordinal classification algorithm resulted in a significant improvement over the standard version on 29 UCI benchmark datasets. They also experimentally confirmed that the performance gap increases with the number of classes. Most importantly, their results showed that decision tree-based ordinal classification was able to generally produce a much simpler model with better ordinal prediction accuracy.
Second, to consider the inherent order of class labels, other standard classification algorithms such as KNN [20], SVM [21], and NN require modification (nontrivial changes) in the training methods. Traditional regression techniques can also be applied on ordinal valued data due to their ability to classify interval or ratio quantity values. However, their application to truly ordinal problems is necessarily ad hoc [7]. In contrast, the key feature of using decision tree for ordinal classification is that there is no need to make any modification on the underlying learning algorithm. The core idea is simply to transform the ordinal classification problem into a series of binary-class problems.
Third, owing to the advantages of fast speed, high precision, and ease of understanding, decision tree algorithms are widely used in classification, as well as ordinal classification. Ordinal classification is sensitive to noise in data. Even a few noisy samples exist in the ordinal dataset; they can change the classification results of the overall system. However, in the presence of noisy and missing data, a pruned decision tree can reflect the order structure information very well with good generalization ability. The benefits of implementing decision tree-based ordinal classification are not limited to these. It builds white box models, so the trees constructed from the algorithm can be visualized and the classification results can be easily explained by Boolean logic. In addition, the most important features, which are near to the root node, are emphasized via the construction of the tree for the ordinal prediction task.

Tree-Based Classification Algorithms.
In this work, three tree-based classification algorithms (C4.5 decision tree, Ran-domTree, and REPTree) are used as base learners to implement ordinal classification process on transportation datasets that consists of ordered class values.

C4.5 Decision Tree.
Decision tree is one of the most successful classification algorithms, which predicts unknown class attributes using a tree structure grown with depthfirst strategy. The structure of decision tree consists of nodes, branches, and leaves that represent attributes, attribute values, and class labels, respectively. In the literature, there are several decision tree algorithms such as C4.5, ID3 (iterative dichotomiser), CART (classification and regression trees), and CHAID (chi-squared automatic interaction detector). In this study, C4.5 decision tree algorithm was used, due to its popularity, for the ordinal classification problem in transportation sector.
The first step in C4.5 decision tree algorithm is to specify root of the tree. The attribute which gives the most determinant information for the prediction process is selected for the root node. To determine the order of features in the decision tree, information gain formula is evaluated for each attribute as defined in where V is subset of states for attribute with value V. The entropy indicates the impurity of a particular attribute in the dataset as defined in where is a state, is the possibility of outcome being in state for the set , and is the number of possible outcomes. The attribute that has the maximum information gain value is selected for the root of the tree. Likewise, all attributes in the dataset are placed to the tree according to their information gain values.

RandomTree.
Assume a training set with attributes and m instances, and RandomTree is an algorithm for constructing a tree that considers randomly chosen features (x≤z) with instances at each node [24]. While, in a standard decision tree, each node is split using the best split among all features, in a random tree, each node is split using the best among the subset of attributes randomly chosen at that node. The tree is built to its maximum depth; in other words, no pruning procedure is applied after the tree has been fully built. The algorithm can deal with both classification and regression problems. In classification task, the predicted class for a sample is determined by traversing the tree from the root node to a leaf according to the question posed about an indicator value at that node. When a leaf node is reached, its label determines the classification decision. The RandomTree algorithm does not need any accuracy estimation metric different from the other classification algorithms.

REPTree.
The reduced error pruning tree (REPTree) algorithm builds a decision / regression tree using information gain / variance and prunes it using a simple and fast technique [25]. The pruning technique starts from the bottom level of the tree (from leaf nodes) and replaces the node with most famous class. This change is accepted only if the prediction accuracy is good. By this way, it helps to minimize Journal of Advanced Transportation 5 the size of decision trees by removing sections of the tree that gives little capacity to classify instances. The values of numeric attributes are sorted once. Missing values are also dealt with by splitting the corresponding instances into parts.

Ensemble Learning.
Ensemble learning is a machine learning technique that combines a set of individual learners and predicts a final output [26]. First, each learner in the ensemble structure is trained separately and multiple classification models are constructed. Then, the obtained outputs from each model are compounded by a voting mechanism. The commonly used voting method available for categorical target values is major class labelling and the voting methods for numerical target values are average, weighted average, median, minimum, and maximum.
Ensemble-learning approach constructs a strong classifier from multiple individual learners and aims to improve classification performance by reducing the risk of an unfortunate selection of these learners. Many ensemble-based studies [27,28] proved that ensemble learners give more successful results than classical individual learners.
In the literature, the ensemble methods are generally categorized under four techniques: bagging, boosting, stacking, and voting. In one of our studies, we implemented bagging approach by combining multiple neural networks which includes different parameter values for the prediction task in textile sector [26]. In the current research, bagging and boosting (AdaBoost algorithm) methods with ordinal class classifier were applied on transportation datasets. For each method (bagging and boosting), tree-based classification algorithms (C4.5 decision tree, RandomTree, and REPTree algorithms) were used as base learners separately.

3.3.1.
Bagging. Bagging (bootstrap aggregating) is a commonly applied ensemble technique which constitutes training subsets by selecting random instances from original dataset using bootstrap method. Each classifier in the ensemble structure is trained by different training sets and so multiple classification models are produced. Then, a new sample is given to each model and these models predict an output. The obtained outputs are aggregated, and a single final output is achieved.
(2) Each dataset is trained by a base learner and multiple classification models are constructed, =C( ).
(3) Consensus of classification models is tested to calculate out-of-bag error.
(4) New sample is given to classifiers as input and the outputs are obtained from each model, = ( ).
(5) The outputs of models { 1 , 2 , . . . , t } are combined as in * ( ) = arg max Boosting. In boosting method, classifiers are trained consecutively to convert weak learners to strong ones. A weight value is assigned to each instance in the training set. Then, in each iteration, while the weights of misclassified samples are increased, correctly classified ones are decreased. In this way, the misclassified samples' chances of being in the training set increase. In this study, AdaBoost algorithm was utilized to implement boosting method.
AdaBoost. AdaBoost, also known as adaptive boosting, is the most popular boosting algorithm which trains learners by reweighting instances in the training set iteratively. Then, the outputs produced by each learning model are aggregated using a weighted voting mechanism.

Proposed Method: Ensemble-Based Ordinal Classification (EBOC)
The proposed method, named ensemble-based ordinal classification (EBOC), combines ensemble-learning paradigm including its popular methods such as AdaBoost and bagging with the traditional ordinal class classifier algorithm to improve prediction performance. The ordinal class classifier algorithm which was proposed in [7] is used as a base learner for the ensemble structure of the proposed approach. In addition, tree-based algorithms such as C4.5 decision tree, RandomTree, and REPTree are also used as a classifier in ordinal class classifier algorithm. The first step of the ordinal class classifier algorithm is to reduce multiclass ordinal classification problem to a binary classification problem. To realize this approach, the ordinal classification problem with different class values is converted to k-1 two-class classification problems. Assume that there is a dataset with four classes (k=4); here, the task is to find two sides: (i) class C1 against classes C2, C3, and C4; (ii) classes C1 and C2 against classes C3 and C4; and finally (iii) classes C1, C2, and C3 against class C4.
For example, suppose we have car evaluation dataset [29] which evaluates vehicles according to buying price, maintenance price, and technical characteristics such as comfort, number of doors, person capacity, luggage boot size, and safety of the car. This dataset has an ordinal class attribute with four values: unacc, acc, good, and vgood. First, four different class values of the original dataset are converted to binary values according to these rules: Classes > "unacc", Classes > "acc" and Classes > "good" class valued attribute. If we consider "Classes > 'unacc'" rule, class values higher than "unacc" are labelled as 1 and the others are labelled as 0. In this way, three different transformed datasets that contain binary class values are obtained. In the next stage, a classification algorithm (i.e., C4.5, RandomTree, or REPTree) is applied on each obtained dataset separately. Figure 2 shows the  demonstration of how the ordinal class classifier algorithm works on the "car evaluation" dataset.
When predicting a new sample, the probabilities are computed for each class value using k-1 binary classification models. For example, the probability of the "unacc" class value ( | ) in the sample car dataset is evaluated by 1 − (  >  |  ). Similarly, the other three ordinal class value probabilities are computed. And, finally, the class label which has the maximum probability value is assigned to the sample. In general, the probabilities of the class attribute values depend on "car evaluation" dataset which are calculated as follows: The proposed approach (EBOC) utilizes the ordinal classification algorithm as base learner for bagging and boosting ensemble methods. This novel ensemble-based approach aims to obtain successful classification results under favour of high prediction ability of ensemble-learning paradigm. The general structure of the proposed approach is presented in Figure 3. When bagging is considered, random samples are selected from the original dataset to produce multiple training sets, or when boosting is utilized, samples are selected with specified probabilities based on their weights. After that, the EBOC derives new datasets from the original dataset, each one with new binary class attribute. Each dataset is given to ordinal class classifier algorithm as an input. Then, a classification algorithm (i.e., C4.5, RandomTree, or REPTree) is applied on the datasets and multiple ensemblebased ordinal classification models are produced. Each classification model in this system gives an output label and the majority vote of these outputs is selected as a final class value. The pseudocode of the proposed approach (EBOC) is presented in Algorithm 1. First, multiple training sets are generated according to the preferred ensemble method (bagging or boosting). Second, the algorithm converts ordinal data to binary data. After that, a tree-based classification algorithm (i.e., C4.5, RandomTree, or REPTree) is applied on the binary training sets and, by this way, various classifiers are produced. After this training phase, if boosting is chosen as ensemble method, the weight of each sample in the dataset is updated dynamically according to the performance (error rate) of the classifiers in the current iteration. To predict the class label of a new input sample, the probability for each ordinal class is estimated by using the constructed models. The algorithm chooses the class label with the highest  probability. Lastly, the majority vote of the outputs obtained from each ordinal classification model is selected as final output.

Experimental Studies
In the experimental studies, the proposed approach (EBOC) was implemented on twelve different benchmark transportation datasets to demonstrate its advantages over the standard nominal classification methods. The application was developed by using Weka open source data mining library [30]. Individual tree-based classification algorithms (C4.5, RandomTree, and REPTree), ordinal classification algorithm [7], and the EBOC approach were applied on the transportation datasets separately and they were compared in terms of accuracy, precision, recall, F-measure, and ROC area. In the EBOC approach, C4.5, RandomTree and REPTree algorithms were also used as a base learner in ordinal class classifiers separately. The obtained experimental results in this study are presented with the help of tables and graphs.

Dataset Description.
In this study, 12 different transportation datasets which are available in several repositories for public use were selected to demonstrate the capabilities of the proposed EBOC method. The datasets were obtained from the following data archives: UCI Machine Learning Repository [31], Kaggle [32], data mill north [33], and NYC open data [34]. The detailed descriptions about the datasets (i.e., what are they and how to use them) are given as follows.
Auto MPG. Auto MPG dataset, which was used in the American Statistical Association Exposition, is presented by the StatLib library which is maintained at Carnegie Mellon University. This dataset is utilized for the prediction of the city-cycle fuel consumption in miles per gallon of the cars according to their characteristics such as model year, the number of cylinders, horsepower, engine size (displacement), weight, and acceleration.
Automobile. The dataset was obtained from 1985 Ward's Automotive Yearbook. It consists of various automobile characteristics such as fuel type, body style, number of doors, engine size, engine location, horsepower, length, width, height, price, and insurance score. These features are used for the classification of automobiles by predicting their risk factors, six different risk ranking ranging from risky (+3) to pretty safe (-3).
Bike Sharing. The data includes the hourly count of rental bikes between years 2011 and 2012. It was collected by Capital bike share system from Washington D.C., where membership, rental, and bike return are automated via a network of kiosk locations. The dataset is presented for the prediction of hourly bike rental counts based on the environmental and seasonal settings such as weather situation (i.e., clear, cloudy, rainy, and Car Evaluation. This dataset is useful to evaluate the quality of the cars according to various characteristics such as buying price, maintenance price, number of doors, person capacity, size of luggage boot, and estimated safety of the car. The target attribute in the dataset indicates the overall scores of the cars as unacceptable, acceptable, good, and very good. Car Sale Advertisements. The data was collected from private car sale advertisements in Ukraine in 2016. It was proposed for the prediction of seller's price (denominated in USD) in the advertisement according to well-known car features such as model, body type, mileage, engine volume, engine type, and drive type.
NYS Air Passenger Traffic. The data was collected monthly by the Port Authority of New York State between years 1977 and 2015. The aim is to predict the total number of domestic and international passengers for five airports (ACY, EWR, JFK, LGA, and SWF).

Road Traffic Accidents (2017).
The dataset contains the records of traffic accidents across the City of Leeds, UK, reported in 2017 with the information of location, number of people and vehicles involved, the type of vehicle, road surface, weather situations, lighting conditions, and the age and sex of casualty. The aim is to predict the severity of casualty as slight, serious, or fatal.

SF Air Traffic Landings Statistics and SF Air Traffic Passenger
Statistics. These two separate datasets include aircraft landings and passenger statistics of San Francisco International Airport recorded during the period from July 2005 to March 2018. These datasets are used to predict monthly landing and passenger counts, respectively.
Smart City Traffic Patterns. This dataset includes the traffic patterns of the four junctions of a city collected between 2015 and 2017. The aim is to manage the traffic of the city better and to provide input on infrastructure planning for the future to improve the efficiency of services for the citizens. To serve this purpose, the number of vehicles in the traffic is predicted to implement a robust traffic system for the city by being prepared for traffic peaks.
Statlog (Vehicle Silhouettes). The Statlog dataset contains the features of vehicle silhouettes extracted by Hierarchical Image Processing System (HIPS). The vehicle may be viewed from one of many different angles. The aim is to classify a given silhouette using a set of features extracted from the image. (2012-2013). The dataset includes the hourly traffic volume counts collected by New York Metropolitan Transportation Council between years 2012 and 2013. The traffic counts were measured on various roads from one intersection to another with a specified direction (NB, SB, WB, and EB). The attribute that will be predicted in this dataset is traffic volume at 11:00 -12:00 AM.

Traffic Volume Counts
Before the application of the nominal and ordinal classification algorithms, generally the datasets have passed through data preprocessing steps. In this study, the ID attributes in the transportation datasets were eliminated in the data reduction step. The date attributes were split into three triplets: day, month, and year, because they provide more useful information for the prediction task. In addition, continuous class values were discretized into 3 bins using equal Entire Dataset Step 1 Train Set

Test Set
Step 2 Step 10 …… Figure 4: The 10-fold cross-validation process. frequency technique, since ordinal classification algorithms require categorical ordinal data.
Basic characteristics of the investigated transportation datasets are given in Table 1. These are the number of instances, attributes and classes, target attribute, data preprocessing operations, and class distributions in each bin.

Experimental Work.
In each experiment, four methods were compared on the transportation datasets: (i) individual tree-based classification algorithms, (ii) ordinal class classifier, (iii) boosting-based ordinal classification, and (iv) bagging-based ordinal classification. They were compared by using n-fold cross-validation technique selecting as 10. In this validation technique, the entire dataset is divided into equal size parts, n-1 of them is chosen for training phase of the classification model and the last part is used in testing phase. This process is repeated times with changing parts for training and testing phase. When the cross-validation process is terminated, the accuracy values obtained in each step are averaged and a single accuracy value is produced as a success sign of the algorithm. Figure 4 represents 10-fold cross-validation process.
In this study, alternative tree-based ordinal classification algorithms were compared according to accuracy, precision, recall, F-measure, and ROC area measures. The metrics are explained with their abbreviations, formulas, and definitions in Table 2.

Experimental
Results. In this study, three different experiments with three different tree-based classification algorithms (C4.5, RandomTree, and REPTree) were performed to compare the classification success of the proposed EBOC approach with the existing individual algorithms and ordinal class classifier algorithm. To evaluate the classification performances of the algorithms on a specific transportation dataset, accuracy rate values were calculated using n-fold cross-validation technique selecting as 10. In each experiment, one of the tree-based classification algorithms was used as base learners. For example, in the first experiment, four methods were applied and compared on 12 different transportation datasets: (i) C4.5: the individual

ROC Area Receiver Operating Characteristic Area
The area under the curve which is generated to compare correctly and incorrectly classified instances.  Tables 3, 4, and 5 show the comparative results of implemented techniques on twelve benchmark datasets in terms of the accuracy rates. The accuracy rates which are higher than individual classification   Tables 6, 7, and 8 give all pairwise combinations of the tree-based algorithms with ordinal and ensemble-learning versions. Each cell in the matrix represents the number of wins, losses, and ties between the approach in that row and the approach in that column. For example, in the pairwise of C4.5 and Bag.Ord.C4.5 algorithms, 2-10-0 indicates that standard C4.5 algorithm is better than Bag.Ord.C4.5 only on 2 datasets, while Bag.Ord.C4.5 is better than the other on 10 datasets. Their accuracies are not equal in any dataset. When all matrices are examined, it is clearly seen that our proposed approach EBOC outperforms all other methods.
The graphs given in Figures 5, 6, and 7 show the average ranks of the tree-based algorithms (C4.5, RandomTree, and  REPTree) with ordinal and ensemble-learning (AdaBoost and bagging) versions, respectively. In the ranking method, each approach used in this study is rated according to its accuracy score on the dataset. This process is performed by starting with giving rank 1 to the classifier with the highest classification accuracy and continues to increase the rank value of the classifiers until assigning rank to the worst one of the classifiers. In the case of tie, average of the classifiers' rankings is assigned to each classifier. Then, mean values of the ranks per classifiers on each datasets are computed as average ranks. According to the comparative results, EBOC approach with bagging method has the best performance among the others in Figures 5 and 6, because it gives the lowest rank value.  The proposed algorithms (Ada.Ord. * and Bag.Ord. * ) were applied on the transportation datasets and were compared with ordinal and nominal (standard) classification algorithms in terms of accuracy, precision, recall, F-measure, and ROC area metrics. Additional measures, besides accuracy, are required to evaluate all aspects of the classifiers and they are useful for providing additional insight into their performance evaluations. Table 9 shows the average results obtained from 12 transportation datasets at each experiment. Except accuracy, the metrics listed in Table 9 give a degree ranging from 0 to 1. The algorithm with higher rate means that it is more successful than others. Among C4.5 decision Ada.Ord.C4.5 Ord.C4.5 C4. 5 3-9-0 3-9-0 7-4-1 Ord.C4. 5 1-11-0 3-9-0 Ada.Ord.C4. 5 6-6-0 Bag.Ord.C4.5 H 0 : There are no performance differences among the classifiers on the datasets H 1 : There are performance differences among the classifiers on the datasets The p-value is defined as the probability, which is a decimal number between 0 and 1, and it can be expressed as a percentage (i.e., 0.1 = 10%). With a small p-value, we reject the null hypothesis (H 0 ), so the relationship between the classification results is significantly different.

Statistical Tests for Comparing Multiple Classifiers.
Multiple comparison tests (MCT) are used to establish a statistical comparison of the results reported among various classification algorithms. We performed two well-known MCT to determine whether the classifiers have significant differences or not: Friedman test and Quade test.
Friedman Test. The Friedman test is a nonparametric statistical test that aims to detect significant differences between the behaviors of two or more algorithms. Initially, it ranks the algorithms for each dataset separately between 1 (smallest) and 4 (largest). In case of ties, the average rank is assigned to them. After that, the Friedman test statistics ( 2 ) is calculated according to the following:  where is the number of datasets, t is the number of classifiers, and is the total of the ranks for the th classifier among all datasets. The Friedman test is approximately chisquare ( 2 ) distributed with t-1 degrees of freedom (df ) and the null hypothesis is rejected, if 2 > 2 −1, in the corresponding significance .
The obtained Friedman test value for the four C4.5-based classifiers is 10.525. Since the obtained value is greater than the critical value ( 2 =3, =0.05 = 7.81), null hypothesis (H 0 ) is rejected, so it has been concluded that the four classifiers are significantly different. The same situation is also valid for RandomTree-based and REPTree-based algorithms, which have 15.3 and 20.1 test values, respectively. The obtained pvalues for the C4.5, RandomTree, and REPTree related EBOC results (Tables 3, 4, and 5) are 0.01459, 0.00158, and 0.00016, which are extremely lower than 0.05 level of significance. Thus, it is possible to say that the differences between their performances are unlikely to occur by chance.
Quade Test. Like the Friedman test, the Quade test is a nonparametric test, which is used to prove that the differences among classifiers are significant. First, the performance results of each classifier are ranked within each dataset to yield R , , in the same way as the Friedman test does, where is the number of datasets and t is the number of classifiers, for = {1, 2, . . . , } and = {1, 2, . . . , }. Then, the range for each dataset is calculated by finding the difference between the largest and the smallest observations within that dataset. The obtained rank for dataset is denoted by . The weighted average adjusted rank for dataset with classifier is then computed as S , = Q i * (R , − ( + 1)/2). The Quade test statistic is then given bŷ where =∑ =1 , is the sum of the weighted ranks for each classifier. Thêvalue is tested against the F-distribution for a given with df 1 = (k -1) and df 2 = (b − 1)(k − 1) degrees of freedom. If̂> ( -1),( −1)( −1), then null hypothesis is rejected. Moreover, the p-value could be computed through normal approximations. The p-values computed through the Quade statistical tests for the C4.5, RandomTree, and REPTree related EBOC algorithms are 0.013398, 0.000018, and 0.000522, respectively. Test results strongly suggest the existence of significant differences among the considered algorithms, at the level of significance = 0.05. Hence, we reject the null hypothesis which indicates that all classification algorithms have the same performance. Thus, it is possible to say that at least one of them behaves differently.

Statistical Tests for Comparing Paired Classifiers.
In addition to multiple comparison tests, we also conducted pairwise comparison test to demonstrate that the proposed algorithms (Ada.Ord. * and Bag.Ord. * ) have significant differences from others when individually compared. Wilcoxon signed rank test was applied to see pairwise performance differences.

Wilcoxon Signed Rank
Test. This statistical test consists of the following steps to check the validity of the null hypotheses: (i) calculate performance differences between two algorithms for each dataset, (ii) order the differences according to their absolute values from smallest to largest, (iii) rank the absolute value of differences, starting with the smallest as 1 and assigning average ranks in case of ties, (iv) calculate W + and Wby summing the ranks of the positive and negative differences separately, (v) find as the smaller of the sums, W = min(W + , W -), (vi) calculate z-score as = / w and find p-value from the distribution table, and (vii) determine the significance level according to the computed p-value [36] and reject the null hypothesis if p-value is less than a certain risk threshold (i.e., 0.1 = 10%):  Based on all statistical tests (Friedman, Quade, and Wilcoxon signed rank test), we can safely reject the null hypothesis (i.e., there are no performance differences among the classifiers on the datasets), since the p-values computed through the statistics are less than the critical value (0.05). Thus, the proposed approach (EBOC) has a statistically significant effect on the response at the 95.0% confidence level.

Conclusions and Future Works
As a result of developments in the field of transportation, enormous amounts of raw data are generated every day. This situation creates the potential to discover knowledge patterns or rules from it and to model the behaviour of a transportation system using machine learning techniques. To serve the purpose, this study focuses on the application of ordinal classification algorithms on real-world transportation datasets. It proposes a novel ensemble-based ordinal classification (EBOC) approach. This approach converts the original ordinal class problem into a series of binary class problems and use ensemble-learning paradigm (boosting and bagging) at the same time. To the best of our knowledge, this is the first study in which ordinal classification methods and our proposed approach were applied on transportation sector. In the experimental studies, the proposed model with the tree-based learners (C4.5, RandomTree, and REPTree) was implemented on twelve benchmark transportation datasets that are available for public use. The proposed EBOC method was compared with ordinal class classifier and traditional tree-based classification algorithms in terms of accuracy, precision, recall, f-measure, and ROC area. The results indicate that the EBOC approach provides more accurate classification results than them. Moreover, statistical test methods (Friedman, Quade, and Wilcoxon signed rank tests) were used to prove that the classification accuracies obtained from the proposed algorithms (Ada.Ord. * and Bag.Ord. * ) are significantly different from the traditional methods. Therefore, the proposed EBOC method can assist in making right decisions in transportation. Our findings demonstrate that the improvement in performance is a result of exploiting ordering information and applying ensemble strategy at the same time.
As future work, different types of ensemble-based ordinal classifiers can be developed using different ensemble methods such as stacking and voting, and using different classification algorithms such as Naive Bayes, k-nearest neighbour, neural network, and support vector machine. In addition, ensemble clustering models can be improved to cluster transportation data. Furthermore, a comparative study which implements ensemble-learning and deep learning paradigms can be performed in transportation fields.