A data-driven approach for characterising the charging demand of electric vehicles: A UK case study

21,918 charging events from 255 different charging stations in UK were analysed. (cid:1) A data pre-processing methodology for dealing with EVs charging data was presented. (cid:1) A data mining model was developed to analyse the EVs charging data. (cid:1) A fuzzy logic decision model was developed to characterise the EVs charging demand.

Thi s v e r sio n is b ei n g m a d e a v ail a bl e in a c c o r d a n c e wit h p u blis h e r p olici e s. S e e h t t p://o r c a . cf. a c. u k/ p olici e s. h t ml fo r u s a g e p olici e s. Co py ri g h t a n d m o r al ri g h t s fo r p u blic a tio n s m a d e a v ail a bl e in ORCA a r e r e t ai n e d by t h e c o py ri g h t h ol d e r s .

Introduction
Electric Vehicles (EVs) offer reduced transportation related emissions, reduce the energy cost of driving and in some cases eliminate the use of fossil fuels. The total electricity demand is expected to grow as the number of EVs increases [1]. The impact of EVs charging on distribution networks has been investigated in the literature. The majority of these studies use synthetic data to assess the impact of the EVs charging load due to limited access to real EVs charging data. In [2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19] data from travel surveys are used to create EVs charging load profiles, assuming that EVs are travelling like conventional internal combustion engine vehicles.
Although EVs adoption is at an early stage, some utilities and aggregators are already collecting information from charging stations. A limited number of EVs pilots exist around the world, allowing some preliminary studies on charging demand profiles. In [20], statistical analysis of 4933 charge events in the Victorian EVs Trial in Australia was performed. Statistical models for charge duration, daily charge frequency, energy consumed, start time of charge event, and time to next charge event were estimated to express the uncertainty of usage patterns due to different user behaviours. Data from the Western Australian Electric Vehicle Trial (2010-2012) were analysed in [21,22], investigating the drivers' recharging behaviours and patterns. In [23], 7704 electric vehicle recharging event data from the SwitchEV trials in the north east of England were used to analyse the recharging patterns of 65 EVs. The results showed that minimal recharging occurred during off peak times. In [24] data from the same project were combined with low voltage smart meter data from Customer Led Network Revolution (CLNR) project and the impact of the combined demand profile was assessed on three different distribution networks. The results showed that the spatial and temporal diversity of EVs charging demand reduce its impact on those distribution networks. Finally, data from over 580,000 charging sessions and from 2000 non-residential electric vehicle supply equipment's (EVSE) located in Northern California were analysed in [25]. The scope of this analysis was to investigate the potential benefits of smart charging utilising the extracted information regarding the actual trips and customer characteristics.
Monitoring the charging events will inevitably create large volumes of data. These data require effective data mining methods for their analysis in order to extract useful information. In [26][27][28] various data mining techniques were utilized to address challenges in the energy sector, such as load forecasting and profiling. In [29][30][31] data mining modelling frameworks were applied to electricity consumption data to support the characterisation of end-user demand profiles.
In this paper, a framework was developed to characterize the EVs charging demand of a geographical area. The technical contributions of this paper are summarised below: (i) Real EVs charging data from UK were acquired and analysed.
The diverse data were organised and classified into attributes. To the authors' best of knowledge, this is the first time that real EVs charging data are presented using this level of detail. (ii) A comprehensive data cleaning and formatting methodology is presented, developed specifically for dealing with EVs charging data. (iii) A data mining model was developed to extract the useful information. Three key characteristics of EVs charging demand in a geographical area were investigated using the proposed methodology, namely shape of the typical daily profile, predictability with respect to weather and trend. Clustering, correlation and regression analysis were performed to study each characteristic, using factors to quantify them. Analysing these characteristics resulted in assessing the potential risks and uncertainties which affect the midterm normal operation of the corresponding distribution network. (iv) A fuzzy logic decision model was developed that aggregates the three factors into one ''risk level" index. The ''risk level" index was defined in order to characterize the EVs charging demand, reflecting its potential impact on the energy demand in a geographical area. Areas with high ''risk level" values imply a potential risk for the mid-term normal operation of the distribution networks and such analysis could be important for the distribution network operator (DNO). No similar research work that quantifies the mid-term relative risk of the EVs charging demand among different geographical areas independently to their actual corresponding distribution networks was done so far. (v) Furthermore, this paper fills a gap in the literature related to handling real EVs charging data, by proposing a complete data analysis methodology.
The rest of the paper is organized as follows: Section 2 describes the real EVs charging data analysed. In Section 3 the proposed methodology to characterize the EVs charging demand is illustrated. A case study is presented in Section 4, applying the model on real EVs charging events from UK to study the charging demand characteristics, and assess their potential impact. Finally, conclusions are drawn in Section 5.

Data description
EVs charging demand data were obtained from the Plugged-in Midlands (PiM) project (http://www.pluggedinmidlands.co.uk/). The Plugged-in Midlands project, managed by Cenex, is one of the eight 'Plugged-in Places' projects supported by OLEV, the Office for Low Emission Vehicles in the UK. Two datasets were provided by Cenex, with information regarding the charging events and charging stations respectively. The charging events dataset consists of 21,918 charging events from 255 different charging stations and 587 unique EVs drivers. The charging event dataset includes information about the connection/disconnection times and the energy of each charging event for the period of 2012-2013 with event-occurrence granularity. The charging station dataset contains time-independent information regarding the location and technical specifications of all charging points (e.g. the charging power rate). The contents of the two datasets are listed in Tables  1 and 2.
An additional dataset was acquired from the UK Met Office, with information regarding the weather in the Midlands, the geographical area under study. This dataset includes the values of various weather information (e.g. air temperature) with daily granularity for the period of 2012-2013. The weather attributes are listed in Table 3.

Methodology
The characterisation framework consists of three models: (i)   for the Data Mining model. The Data Mining Model consists of three modules namely Clustering Module, Correlation Module and Regression Module. These modules were used to investigate the shape of the typical daily profile, the predictability with respect to weather and the trend of EVs charging demand respectively. The Fuzzy Based Characterisation Model aggregates the outputs of the Data Mining model into a ''risk level" index of EVs charging demand in a geographical area using fuzzy logic. The characterisation framework is illustrated with Fig. 1.

Data pre-processing model
Data of the Connection Time, Disconnection Time, Energy Drawn, Charging Station ID, Charger Type and County were selected and merged into one dataset (EV dataset). The EV dataset and the weather dataset were cleaned, removing missing and incorrect values. In the EV dataset, charging events with zero/negative energy were removed from the dataset. Charging events with average charging power higher than the nominal charger rate were corrected by calculating the actual charging duration using the nominal charger power rate. This consideration is based on the assumption that some EVs may be connected (parked) in a charging station but they are not charging. Therefore, the duration of EVs being connected to a charging station can be different to their actual charging duration. Duplicate data entries were also discovered and removed from both datasets.
Data regarding a charging event is recorded from the charging station and then forwarded to one or more data collection centres. This process involves a number of components and communication links increasing the risk of a potential failure in this chain.
Corrupted or missing data are not a rare phenomenon in such complex communication networks. However, a careful analysis at this stage is also beneficial to find the location or the station's ID from where the corrupted data are recorded, an indication of an abnormal operation.
The next stage of the Data Pre-processing model is the Formation stage. The EV dataset was formatted using a Matlab script into three time series; an hourly power time series, a daily peak power time series and a monthly energy time series. The hourly power time series was transformed into daily vectors (each of 24 values) and forwarded to the Clustering Module, whereas the monthly energy time series was forwarded to the Regression Module. All the data attributes of the Weather dataset were formatted into daily time series and merged with the daily peak power time series. The resulting (combined) time series was forwarded to the Correlation Module. The data pre-processing procedure is presented in Fig. 2.

Data mining model
The Data Mining Model consists of a Clustering Module, Correlation Module and Regression Module. These modules were used to investigate the shape of the typical daily EVs charging demand profile, the predictability with respect to weather and the trend of EVs charging demand respectively.

Clustering module
The clustering module creates typical daily EVs charging demand profiles of a geographical area, according to the load demand of the corresponding charging stations. These profiles are related to the aggregated daily pattern of the EVs charging demand of a specific geographical area.
The k-means clustering method described in [32,33], was used in this module. Initially, this algorithm selects k random daily vectors (Input from Data Pre-processing Model) as the initial cluster centroids and calculates the distance from each daily vector to the cluster centroids. Each daily vector is assigned to a cluster/group according to its distance with the nearest cluster centroid. Then, the new cluster centroids are obtained from the average of the daily vectors for the corresponding cluster. This process is repeated until the distances between the daily vectors and the corresponding cluster centroids are minimized. This is explained mathematically by Eq. (1): where c i is the set of daily vectors that belong to ith cluster, x expresses the corresponding daily vector in c i and l i is the position of the ith cluster centroid. The method requires the number k of clusters to be defined a priori. The Davies-Bouldin evaluation criterion was used to calculate the number k of clusters [34,35]. This criterion is based on a ratio of within-cluster and between-cluster distances and is defined by Eq. (2): where d i is the average distance between each point in ith cluster and the centroid of ith cluster. d j is the average distance between each point in ith cluster and the centroid of jth cluster. d ij is the distance between the centroids of ith and jth clusters. The maximum value of this ratio represents the worst-case within-to-between cluster ratio for ith cluster. The ''best" clustering solution has the smallest Davies-Bouldin index value. Therefore, an additional step exists to evaluate the centroid selection for our dataset. A range of 1-20 clusters was considered, where 20 was found to be a reasonable maximum value [36], and the best number of clusters within this interval was calculated using an iterative process. By applying the k-means clustering method to the dataset, the k cluster centroids c i are obtained, along with the number of vectors w i assigned to each cluster. The followed steps of the Clustering Module are presented in Fig. 3.
The most representative cluster centroid (highest value of w i ) was used to create the typical daily EVs charging demand profile of an area. Having the daily EVs charging demand profile of an area, an index k was defined to express the proportion of EVs charging demand during peak hours (17:00 -20:00) [37]. The index k was calculated using Eq. (3): where E peak is the charging load during the peak hours and E total is the total daily charging load.

Correlation module
According to [38], weather affects road traffic congestion and the driving behaviour of car owners. In [39][40][41], the factors which affect the fuel consumption of EVs were analysed. Cold weather decreases the efficiency of the batteries performance. Additionally, heating the interior of EVs drains significantly the battery. In [42], the impact of cold ambient temperatures on running fuel use was investigated. Considering EVs on the roads, the weather will also affect their energy consumption and thus their charging demand. Identifying hidden strong relationships between weather attributes and load demand improves the forecasting accuracy of a prediction model [43].
The Pearson's Correlation Coefficient (r) was used in this module to measure the correlation between the weather attribute values and the daily peak power of EVs charging demand in a geographical area. The maximum absolute correlation coefficient value of all peak power-weather pairs identifies the most influential weather attribute.

Regression module
The scope of this module is to investigate the monthly change of the EVs charging demand. A Growth Ratio (GR) index was defined as the ratio between the growth rate of EVs charging demand and the average monthly EVs charging demand. Linear regression analysis was applied on the EVs charging demand time series, in order to calculate the mathematical formula describing the relationship between monthly EVs charging demand (Y in kW h) and time (X in months). The formula is described with Eq. (4): where b 0 and b 1 are the constant regression coefficients and e is the random disturbance (error).
The slope b 1 expresses the monthly growth rate of EVs charging demand (in kW h/month). The constant regression coefficients were calculated using the Least Squares Method described in [44]. Having b 1 , the GR index is calculated with Eq. (5).
where E month is the average monthly EVs charging demand.

Fuzzy based characterisation model
The goal of this model was to characterise the EVs charging demand of a geographical area according to the information about the shape of the typical daily profile (k index), the predictability with respect to weather (r) and the trend of EVs charging demand (GR index). To this end, a ''risk level" index was defined to aggregate the potential underlying risks from these characteristics. A fuzzy-logic model was developed to capture the fuzziness of these risks and calculate the ''risk level" index. Fuzzy Logic Models are useful for risk assessment purposes under such conditions [45]. The Fuzzy Based Characterisation Model is illustrated with Fig. 4.
The validity of the risk characterisation model is based on the following considerations/assumptions: i. The magnitude and duration of the peak of the typical EVs charging demand profile (captured by k index) are underlying risk factors for the distribution network, as they affect the transformer/circuit loading and the voltage profile. ii. The change over time of EVs charging demand (described with GR index) affects the long term decision regarding the planning of the network reinforcement. The aggressiveness of EVs charging demand change over time in a geographical area is also a potential risk for the network's operation. iii. The predictability of EVs charging demand with respect to weather in a geographical area (captured by r), affects the accuracy of a forecasting model. Decisions taken based on a forecast are subject to the forecasting accuracy, indicating a risk for the decision maker. iv. Analysing the EVs charging demand characteristics in a geographical area results in assessing the risks and uncertainties which will affect the mid-term normal operation of the distribution network of the corresponding geographical area. v. As an electric power network model was not used to analyse the related actual charging demand characteristics, this study quantifies only the relative risk between different geographical areas. The ''risk level index" is not defined in absolute terms and thus it is used to classify relatively the level of these risks (due to EVs charging) among different geographical areas independently to their actual corresponding distribution networks.
The linguistic values used to express the input variables are Low (L), Medium (M) and High (H). Triangular membership functions are used to calculate the Degree-Of-Membership (DOM) for each of them, as shown in Figs. 5-7. In contrast to other kind of membership functions (e.g. Trapezoids), triangular membership functions are very sensitive to changes of the variables and thus this increase the accuracy.
The output is fuzzified into nine fuzzy regions represented by linguistic variables; very very high (VVH), very high (VH), high (H), medium high (MH), medium (M), medium low (ML), low (L), very low (VL) and very very low (VVL), as shown in Fig. 8. The rule table is given in Table 4.
The design of the rule table is based on the assumption that each of the input indicators affect equally the ''risk level" index. According to the best of the authors' knowledge, there is no research work that quantifies the level of influence of the related indicators (k index, r and GR) to the operation of an electricity distribution network. A further investigation is necessary to understand the relative impacts of these variables on the normal operation of an electricity distribution network, but this is out of the scope of this paper.
The Mamdani type inference was used (also known as the maxmin inference method), which utilizes the minimum function for the implication of the rules. Defuzzification was performed using the centre of gravity (CoG) method [46][47][48]. This method finds   the centre of the area encompassed by all the rules, and thus the risk level index u is mathematically described by Eq. (6): where x is the value of the ''risk level" index, x min and x max represent the range of the ''risk level" index and g(x) is the degree of membership value at x.

Case study
The characterisation framework was applied on EVs charging data from three different geographical areas of the dataset. Charging events and weather data from the counties of Nottinghamshire, Leicestershire and West Midlands were analysed according the proposed modelling framework. Fig. 9 shows the locations of the charging stations for the corresponding geographical areas.

Typical EVs charging demand profiles
The k-means clustering algorithm was applied and the cluster centroids were obtained, along with their level of representation. Using the Davies-Bouldin criterion, the optimal number of clusters for Leicestershire was 5, for Nottinghamshire was 6 and for West Midlands was 3. The results are shown in Figs. 10-12.
The typical daily EVs charging demand profiles for each area are presented in Fig. 13. As seen from Fig. 13, the three typical EVs charging profiles differ in terms of peak magnitude, timing and duration. West Midlands shows the highest peak, however for a very short period (between 10:00 and 12:00), and no charging events during night. On the other hand, the typical EVs charging profiles of Nottinghamshire and Leicestershire have slightly lower peaks, but the charging activity takes place throughout the whole day. The EVs charging load during the peak hours, the total daily  charging load and their ratio k are summarized in Table 5. The two last columns of Table 5 contain information about the total number of charging events and unique EVs drivers for the corresponding geographical areas. As seen from Table 5, the proportion of the required energy during peak hours is relatively low for all counties. This is explained by the fact that the charging events are occurred in public charging stations. Public charging stations are expected to be used for recharging when EVs owners are at their work or when they do shopping or other activities. Considering the fact that the office hours are mostly between 09:00 and 17:00, the authors infer that most EVs owners return home after their work. Thus, this can be a possible justification why the energy requirements are low during peak times. Table 6 shows the absolute correlation coefficient (r) values between the weather attributes and the daily peak power of EVs charging demand. The most influential factor for all areas was temperature, with the Mean Air Temperature having the highest absolute correlation indices. Leicestershire's EVs charging demand shows a medium linear correlation, whereas in Nottinghamshire and West Midlands the EVs charging demand has a weaker relationship with weather.

Influence of weather factors
As the above results show a dependency between EVs charging and Mean Air Temperature, it is useful to investigate the reasons for this relation. Although this investigation is out of the scope of this analysis, the authors provide their explanation about this dependency. In a northern country like UK the climate is considered cold and thus heating the interior of an electric vehicle will result in an increase of the energy requirements.

Trend of EVs charging demand
The linear regression module described in Section 2 was applied on the EVs charging demand time series of the three counties to calculate its growth rate. Figs. 14-16 present the daily EVs charging demand of each county for the period 2012-2013. Noticeable gaps exist in the data, especially for Leicestershire and West Midlands. The total monthly EVs charging demand is illustrated in Fig. 17, along with the corresponding trend line for each county.
Using Eqs. (4), (5), the regression coefficients of the trend line were calculated along with the GR index for each county. The results are summarized in Table 7. As seen from the results, Leicestershire shows the highest EVs charging demand growth rate. On the   contrary, the EVs charging demand in West Midlands reduces slightly over the two years period.

''Risk Level" calculation
Once the Data Mining process is completed, the Fuzzy Based Characterisation Model uses the outputs of the Clustering, Correlation and Regression modules to calculate the ''risk level" index of EVs charging demand for each geographical area. Table 8 summarizes the input values for the characterisation model. Input A is the k index of each county's typical EVs charging demand profile, as calculated from the Clustering module. Input B is the absolute correlation coefficient (r) value of the EVs charging demand and Mean Air Temperature (the most influential weather factor), whereas Input C is the GR index of the EVs charging demand (monthly basis). The latter's membership function was assumed to accept values only in the range of [0%, 50%]; negative GR indices were assumed as 0% increase. The outputs of the Fuzzy Based Characterisation Model for the three counties are presented in Table 9.
As seen from Table 9, the EVs charging demand in West Midlands has the lowest value for ''risk level" index. Looking at the corresponding input values, such a result is expected as the EVs charging demand has a descending trend (GR index) and low energy requirements during peak hours (k index). Leicestershire and Nottinghamshire on the other hand are characterised with higher values of the risk level index by the model. Similar output values for these areas are not unexpected as Leicestershire has slightly higher growth ratio and energy requirements, however the EVs charging demand in Nottinghamshire is more unpredictable (lower correlation coefficient).

Conclusions
A characterisation framework for EVs charging demand was developed. The model utilizes data analysis methods to extract information hidden behind charging events in order to identify the characteristics of the EVs charging load. This information was then used by a fuzzy based characterisation model to estimate the underlying relative risks for the distribution networks among different geographical areas independently to their actual corresponding distribution networks. The framework was applied on a dataset of real charging events from three counties in UK and their ''risk level" index was calculated.
The risk level index gives a spatial indication of the potential impact of the EVs charging demand on a distribution network in the nearby (mid-term) future. Areas with high ''risk level" factor are candidates for further investigation. However, the interpretation of this index is highly influenced by the network characteristics. Other operational metrics (e.g. maximum load capacity) of the corresponding network should also be considered to plan possible network reinforcements. Charging strategies or other demand side management applications can be designed for an area according to its specific EVs charging load characteristics. For example, areas where the EVs charging demand is high during peak times, a valley filling strategy might be useful, whereas areas with random EVs charging events might need to invest on a different demand side management solution.
The universal design of this model makes it applicable from a county area to a neighbourhood area, as only minor changes are required for its application on different datasets. In addition, applying small modifications to the model, the analysis of additional EVs charging demand characteristics are easily supported.