THE ANALYSIS OF TERRORIST ATTACKS IN UKRAINE BASED ON OPEN-SOURCE DATA SET

The article considers results of exploratory data analysis of terrorist attacks in Ukraine based on open-source data set "The Global Terrorism Database" made by National Consortium for the Study of Terrorism and Responses to Terrorism. The original data set has information about 1583 terrorist attacks in Ukraine and reflects changes in the security situation in 2014-2015 years. The outcomes of research can be used for the planning of counterterrorism and information operations. These ranges from basic information about what types of threats and tactics are prevalent to insights on what types of counterterrorism strategies are most likely to be effective.


Introduction
Formulation of the problem in general.The recent changes in security and defence situation in Ukraine concerning Crimea occupation and Anti-Terrorist Operation (ATO) are key elements for a new challenge to Ukranian Armed Forces (UAF) known as a terrorism attacks (TA).It defined as a "threatened or actual use of illegal force and violence by a non-state actor to attain a political, economic, religious, or social goal through fear, coercion, or intimidation" [1][2][3][4][5][6][7][8][9].
One of the approaches for analysis TAs in Ukraine is using "The Global Terrorism Database (GTD) made by National Consortium for the Study of Terrorism and Responses to Terrorism (START) in an effort to increase understanding of terrorist violence so that it can be more readily studied and defeated.The GTS is an open-source database including information on terrorist events around the world from 1970 through 2015 with more than 150,000 cases [6].
For each case it includes information on at least 45 features, with more recent incidents including information on more than 120 features.In general, it includes information on more than 75,000 bombings, 17,000 assassinations, and 9,000 kidnappings since 1970 [6].
The GTD has been leveraged extensively in scholarly publications, reports, and media articles, and the academic community regulary conduct researches of the data set for international and domestics challenges.For example, there are researches for TA in United States and Nepal [8][9].
In the same time the analsyis of existed publications shown that for today there isn't any known research of TA in Ukraine based on this data set.
The purpose of the article is to present results of exploratory data analysis of TA in Ukraine based on open-source GTD data set.

Statement of Materials Research
The information contained in the GDB is based on reports from a variety of open media sources.The GTD unlike many other event databases includes systematic data on domestic as well as transnational and international terrorist incidents that have occurred during this time period.It's based entirely on publicly available, unclassified source materials.These include media articles and electronic news archives, and to a lesser extent, existing data sets, secondary source materials such as books, journals, and legal documents.
With the expansion of online media, the GTD developers created a "hybrid" data collection strategy.It leverage automated processes (natural language processing, machine learning models) to sift through millions of news articles each month.Over 4,000,000 news articles and 25,000 news sources on any topic published daily worldwide were reviewed to collect incident data from 1998 to 2015 alone.Each month, GTD researchers at START review approximately 16,000 articles and identify attacks to be added to the GTD.Currently it's the most comprehensive unclassified data base on terrorist events in the world [5].
The GTD Codebook "Inclusion Criteria and Variables" outlines the features that constitute the GTD and defines their possible values.There are 9 groups of features: GTD ID and date, incident information, incident location, attack information, weapon information, target/victim information, perpetrator information, casualties and consequences, and additional information and sources.
The main analytical insights extracted from these groups of features are the following.
1. GTD ID and date.The GTD that was used in research was downloaded from the website www.kaggle.comand has 156,772 observations and 137 features for 206 countries.There are 1,583 or 1% observations for Ukraine including 491 unique dates of incidents for 19 years of observations.For the first 16 years the number of attacks is less than 10.The situation dramatically changed in 2014 year in which were 891 incidents, and in the next 2015 year with 637 incidents.So, the Ukraine has 1528 or 97% of all attacks in 2014-2015 years.
2. Incident information.There are 1580 and 1581 cases in which an incident met inclusion criteria "Political, economic, religiuos, or social goal" and "Intention to coerce, intimidate or publicize to larger audiences", and 797 cases in which an incident met and in 786 not met inclusion criteria "Outside International Humanitarian Law".
In the 375 cases an attack was a part of "multiple" incident, 96 cases in which the duration of an incident extended more than 24 hours, and 373 cases when an attack is part of a coordinated, multi-part incident.

Incident location.
There are 35 first order subnational administrative regions and 330 names of city, village, or town in which the incident occurred.The city with biggest summary number of attacks is "Donetsk" (tabl.1).There are 462 cases in which an incident occurred in the immediate vicinity of the city and in 164 cases an incident has additional information about the location (for example, "The incident occurred near the Donetsk Airport").
4. Attack information.Two features define the general method of attack and often reflect the broad class of tactics used (tabl.2).The 1423 of attacks are classified as a "successful" according to the tangible effects.And, there are only 2 cases in which perpetrator did not intend to escape from the attack alive.
5. Weapon information.The most popular general type of weapon and weapon sub-types used in the incident are given in the tabl.3. The additional information described conventions follow "Second Weapon Type and Sub-Type" with the popular value "Firearms", "Unknown Gun Type", "Automatic Weapon" and "Rifle/Shotgun (nonautomatic)".
There are 350 unique details with any pertinent information on the type of weapon(s) used in the incident (for example, "Artillery was used in the attack").
6. Target/victim Information.The most popular general type and sub-types of target/victim are given in tabl.4. The main entities from defence and security sectors are "Armed Forces of Ukraine", "State Border Guard Service of Ukraine", "Militsiya", and "National Guard of Ukraine".
There are 450 specific target/victim like that was targeted and/or victimized and is a part of the entity named above.The most popular target/victim are "Soldiers", "Town", "Anti-Terrorist Operation Soldiers", "Checkpoint", and "Donetsk Sergey Prokofiev International Airport", and for the second target/victim type "Private Citizens & Property", "Journalists & Media", and "Utilities".

Perpetrator information.
There are 16 Perpetrator's Group Names as names of the group that carried out the attack, and "Perpetrator Sub-Group Name" with additional qualifiers or details about the name of these groups.

Fig 1. The total number of terrorists participating in the incidents
The total number of terrorists participating in the incidents is given on the fig 1 .The most popular values are 1-3 perpetrators with "1" at the top.
In most known cases a group or person claimed responsibility for the attack (94 or 5.94%).The modes for claim of responsibility used by claimants might be useful to verify authenticity and track trends in their behavior.
There are 71 motive for the attacks like "The specific motive is unknown; however, Ukrainian officials speculated that the bridge was destroyed in order to halt the advances of Ukrainian military forces in the region".
8. Casualties and consequences.The total number of fatalities as a number of total confirmed fatalities for the incident that includes all victims and attackers who died as a direct result of the incident are "0" (1068 or 67.47 %) and "1" (157 or 9.92%) in most cases except 3 incidents were killed more than 100 people (298, 201, and 143).
The total number of injured as a number of confirmed non-fatal injuries to both perpetrators and victims are "0" (963 or 60.83 %) and "1" (135 or 8.53%) in most cases except 3 incidents were injured more than 100 people (157 and 140).
The number of incidents resulted in property damage is equal ("yes" -505 and "no" -498 cases).In 308 known cases the extent of the property damage is defined as "Minor (likely < $1 million)".
There are 326 specific details about the property that was damaged an attack, such as the type of vehicle that was destroyed, the areas or parts of a building that were damaged, or the types of assets that were stolen.For example, there are 26 cases where "A building was damaged in this attack".
One or two victims were taken hostage or kidnapped, and they spent are between 1-8 days in this state in 111 cases in total.
In 2 cases the country that Kidnappers/Hijackers diverted to is "Russia".In one case the incident involved a demand of monetary ransom.Also, there are 9 ransom notes like "The assailants demanded the release of imprisoned pro-Russian separatists in exchange for the safe return of the hostages".
The most popular value of eventual fate of hostages and kidnap victims is "Hostage(s) released by perpetrators" (57 or 3.60%).
Feature selection.The data set contains many features some of them are redundant and can be removed without incurring much loss of information for predictive modeling.The analysis of features clustering in correlation matrix based on using "network_plot()" function from R package "corrr" that allows exploring correlations through visualisation [10].
The plot shows a point for each feature rather than for each correlation.The proximity of the features to each other represents the overall magnitude of their correlations.Each path represents a correlation between the two features that it joins.The color, width and transparency of the line represent the strength of the correlation (fig.3, tabl.5).The research was done by using the following R packages "tidyverse", "data.table","feather", "cluster", "corrplot", "shiny", "ggplot2" [10][11][12].

Conclusions
The analysis of GDT open-source data set is a new opportunity for improvement Ukrainian security and defence decision support information systems.The outcomes of research can be used for the planning of counterterrorism and information operations.
These ranges from providing fairly basic information about what types of threats and tactics are prevalent across various jurisdictions, and how they vary over time to more sophisticated analyses that attempt to provide insights on what types of counterterrorism strategies are most likely to be effective in a given context.

Feature extraction.
As a part of feature engineering process, new features were extracted from existed ones.The most popular partitioning clustering K-means method was used to identify new "cluster" features.The optimal number of clusters is 4 and was defined by using Elbow method, Bayesian Inference Criterion for k means, optimum average silhouette width criteria and Calinski criteria for an optimal number of clusters (fig.3).
The main perspective ways for further research are advanced data analyses of GTD features group by cities, targets, regions, perpetrators, attacks, weapons, including natural language processing text features, and develop predictive models based on this data set.
The selected and extracted features are the basis for further predictive modeling of different analytical insights and predictions.

Table 1
Top 10 cities by the number of attacks

Table 3
Most popular general type of weapon and weapon sub-types used in the incident

Table 5
Text description correlations between features