Criminal behavioral data analysis for recidivation estimation in convicted offenders

The act of continuing to commit crimes after being imprisoned for a first-time offence and freed is known as recidivism. The level of delinquent behavior in an individual character, that is closely associated to repeated recidivation, can be determined by assessing offenders behavioral features. The dataset includes 220 offenders, with a total of 204 participants whose data was used to create the desired dataset. The raw information was acquired using a questionnaire form that included personality traits, parental and family characteristics, socio-demographic characteristics, crime details, cumulative jail behavior elements, and the HCR-20 risk assessment technique. Behavior sample was gathered from several jails and correction facility in the Indian state of Jharkhand for the objective for initial relapse estimation from first convicts in the current study. The dataset can be used by criminologists, sociologists, psychologists, and academicians to determine an offender's pattern and psychological qualities. Specialists in detention centre undertook the felony evaluation.


Value of the Data
• The dataset can be used by criminologists, sociologists, psychologists, and academicians to determine an offender's pattern and psychological qualities. • Policymakers, probation officers, and the criminal justice system can utilise the data to make improved judgement on felony convictions, parole, bail, probation, and penal facilities for first-time offenders. • The data can be utilised to spot high-risk criminals, repeat offenders with aggressive views, and people with anti-social personalities. As a result, the crime prevention programme can be enhanced. • The information can be utilised to improve vocational and skill training to meet the needs of offenders. • The information can be used to track down criminals, analyse their risks, and predict recidivism. • The dataset can also be used to lower crime rates by providing proper rehabilitation, identify crime patterns, and determine how offenders can reintegrate back into society.

Data Description
First time offenders are those who got convicted for any illicit or unlawful act under Indian Penal Code for any crime. This article was based on a study of 220 criminals from the Indian state of Jharkhand [1] who had been convicted at least once in the previous five years. The dataset contains 220 offenders, with a total of 204 participants for the final dataset, as shown in the Table 1 .
In the survey questionnaire the data were filled by the clinical Psychologists, verified and authenticated by the panel of psychologists listed in Table 2 below.
The survey questionnaire consists of different behavior factors of an individual offender such as personality, parental and family, environmental, Demographic, Socio-economic, Offence details, a standard risk assessment tool HCR-20 [2] and cumulative prison behavior factors. The prison behavior, crime details and frequency of each participant were also collected and tabulated in Table 3 .

Experimental design
The experimental set up consists of data verification and validation, data quality and data or features exclusion. A quantitative approach [3] based on semi-structure Performa survey questionnaire was employed to collect data among first time offenders who were convicted once for any criminal act under Indian penal Code. The questionnaire was developed based on the previous researches and were verified and validated by the expertise panel. The expertise panel consists of six members who have relevant experience with respect to the study area. The panel includes academicians, researches, criminal lawyer and subject expert with at least five years of experience in the relevant field.
The panel suggestions and comments were taken in consideration for face validity [4] and content validity. Face validity relates to a researcher's subjective judgement of whether the items in an instrument appear to be relevant, close, and reasonable, whereas content validity [5] is used to examine the accuracy of the domain being assessed.
Accuracy can also be measured with different statistical tools like SPSS, ANOVA [6] etc. We have also built an architecture ( Fig 1 -Data Analysis) based on the datasets using machine learning models to generate Data visualization and final reports. In our datasets the demographic and socio-economic data were collected combinedly with a standard risk assessment tool HCR-20 which was customised as per the requirements of the research. Previous databases consist of either demographic profile or personality traits of an individual to do the behavior assessment [7] individually. Further we can use supervised machine learning approach to calculate the risk of reoffending among FTO's.

Data quality control
To ensure the quality of each form was verified by the panel of expertise thoroughly and data were collected multiple times such that there won't be any biasness within the process. To avoid biasness the survey questionnaire information was filled four times in the span of 3 months.

Exclusion criteria
All the participants information of survey questionnaire requires validation, which can only be done by the field expertise. The panel consists of different field expertise which are required for the exclusion of the features that are less prominent or useful for the current study. The insignificant attributes were listed in the Table 4 below. As the expertise have assigned the scores for each and every parameter an excluded some of the insignificant parameters which has less impact for the study.

Research design
A semi-structure proforma survey questionnaire was designed to collect information from each participant who were convicted once (FTO) in the state of Jharkhand India, currently 2 percent of the whole population of India are serving prison [8] . Jharkhand ranked 16th among 32 states of the country.

Sample and location of study
A sample is a smaller set of items that is selected to represent the characteristics of a larger population. The current study's sample size could be increased to provide more reliable results. Random sampling procedures are used in the dataset of all 204 individuals to reduce bias. The level of significance were determined based on the p-value (where p-value is 0.05), standard statistical analysis were done to find the significant and insignificant features from the dataset. Anova test was employed to find the most significant features, i.e., tabulated in Table 5 . As seen in the table, these characteristics have a correlation with re-offending. The location of the study is the eastern part of India Jharkhand which contributes around 2 percent of the total convicted criminals of India [9] .

Procedure
The data of the questionnaire were taken and filled by the clinical psychologists with the consent of each individual participated. Each individual was informed that their data collected in the questionnaire are fully confidential. Every question was briefly explained to the participants such that he can answer appropriately. To fill the complete form for each participant it takes around 30-35 minutes

Data analysis
The descriptive analysis of the expert panel was used to conduct the data analysis. The analysis was carried out using standard statistical methods

Ethics Statement
The current study was conducted out in compliance with the Declaration, and all subjects gave their informed permission. As previously stated, the study was approved by both the Ethics Committee of the relevant institution, Birla Institute of Technology , Ranchi, India No. CSE/PHD/2017/09 .

Declaration of Competing Interest
According to the researchers, they will have no significant competing interests or personal financial concerns that could have distorted the result of this research.

Data Availability
First Time Offender (Original data) (Mendeley Data).