Predicting judicial decisions of the European Court of Human Rights: A natural language processing perspective

Recent advances in Natural Language Processing and Machine Learning provide us with the tools to build predictive models that can be used to unveil patterns driving judicial decisions. This can be useful, for both lawyers and judges, as an assisting tool to rapidly identify cases and extract patterns which lead to certain decisions. This paper presents the first systematic study on predicting the outcome of cases tried by the European Court of Human Rights based solely on textual content. We formulate a binary classification task where the input of our classifiers is the textual content extracted from a case and the target output is the actual judgment as to whether there has been a violation of an article of the convention of human rights. Textual information is represented using contiguous word sequences, i.e. N-grams, and topics. Our models can predict the court's decisions with a strong accuracy (79% on average). Our empirical analysis indicates that the formal facts of a case are the most important predictive factor. This is consistent with the theory of legal realism suggesting that judicial decision-making is significantly affected by the stimulus of the facts. We also observe that the topical content of a case is another important feature in this classification task and explore this relationship further by conducting a qualitative analysis.


27
In his prescient work on investigating the potential use of information technology in the legal domain, 28 Lawlor surmised that computers would one day become able to analyse and predict the outcomes of 29 judicial decisions (Lawlor, 1963). According to Lawlor, reliable prediction of the activity of judges 30 would depend on a scientific understanding of the ways that the law and the facts impact on the relevant 31 decision-makers, i.e. the judges. More than fifty years later, the advances in Natural Language Processing 32 (NLP) and Machine Learning (ML) provide us with the tools to automatically analyse legal materials, so 33 as to build successful predictive models of judicial outcomes. 34 In this paper, our particular focus is on the automatic analysis of cases of the European Court of 35 Human Rights (ECtHR or Court). The ECtHR is an international court that rules on individual or, much 36 more rarely, State applications alleging violations by some State Party of the civil and political rights 37 set out in the European Convention on Human Rights (ECHR or Convention). Our task is to predict 38 whether a particular Article of the Convention has been violated, given textual evidence extracted from 39 a case, which comprises of specific parts pertaining to the facts, the relevant applicable law and the 40 arguments presented by the parties involved. Our main hypotheses are that (1) the textual content, and 41 (2) the different parts of a case are important factors that influence the outcome reached by the Court. Manuscript to be reviewed

Computer Science
These hypotheses are corroborated by the results. Our work lends some initial plausibility to a text-based 43 approach with regard to ex ante prediction of ECtHR outcomes on the assumption, defended in later 44 sections, that the text extracted from published judgments of the Court bears a sufficient number of 45 similarities with, and can therefore stand as a (crude) proxy for, applications lodged with the Court as 46 well as for briefs submitted by parties in pending cases. We submit, though, that full acceptance of that 47 reasonable assumption necessitates more empirical corroboration. Be that as it may, our more general aim 48 is to work under this assumption, thus placing our work within the larger context of ongoing empirical 49 research in the theory of adjudication about the determinants of judicial decision-making. Accordingly, in 50 the discussion we highlight ways in which automatically predicting the outcomes of ECtHR cases could 51 potentially provide insights on whether judges follow a so-called legal model (Grey, 1983) of decision 52 making or their behavior conforms to the legal realists' theorization (Leiter, 2007), according to which 53 judges primarily decide cases by responding to the stimulus of the facts of the case. 54 We define the problem of the ECtHR case prediction as a binary classification task. We utilise textual 55 features, i.e. N-grams and topics, to train Support Vector Machine (SVM) classifiers (Vapnik, 1998). We 56 apply a linear kernel function that facilitates the interpretation of models in a straightforward manner. Our 57 models can reliably predict ECtHR decisions with high accuracy, i.e. 79% on average. Results indicate 58 that the 'facts' section of a case best predicts the actual court's decision, which is more consistent with 59 legal realists' insights about judicial decision-making. We also observe that the topical content of a case 60 is an important indicator whether there is a violation of a given Article of the Convention or not. 61 Previous work on predicting judicial decisions, representing disciplinary backgrounds in political 62 science and economics, has largely focused on the analysis and prediction of judges' votes given non 63 textual information, such as the nature and the gravity of the crime or the preferred policy position of each 64 judge (Kort, 1957;Nagel, 1963;Keown, 1980;Segal, 1984;Popple, 1996;Lauderdale and Clark, 2012).

65
More recent research shows that information from texts authored by amici curiae 1 improves models for 66 predicting the votes of the US Supreme Court judges (Sim et al., 2015). Also, a text mining approach 67 utilises sources of metadata about judge's votes to estimate the degree to which those votes are about 68 common issues (Lauderdale and Clark, 2014). Accordingly, this paper presents the first systematic study 69 on predicting the decision outcome of cases tried at a major international court by mining the available 70 textual information.

71
Overall, We believe that building a text-based predictive system of judicial decisions can offer lawyers 72 and judges a useful assisting tool. The system may be used to rapidly identify cases and extract patterns 73 that correlate with certain outcomes. It can also be used to develop prior indicators for diagnosing potential 74 violations of specific Articles in lodged applications and eventually prioritise the decision process on 75 cases where violation seems very likely. This may improve the significant delay imposed by the Court and 76 encourage more applications by individuals who may have been discouraged by the expected time delays.

79
The ECtHR is an international court set up in 1959 by the ECHR. The court has jurisdiction to rule on the 80 applications of individuals or sovereign states alleging violations of the civil and political rights set out in 81 the Convention. The ECHR is an international treaty for the protection of civil and political liberties in  criteria. The criteria pertain to a number of procedural rules, chief amongst which is the one on the 93 1 An amicus curiae (friend of the court) is a person or organisation that offers testimony before the Court in the context of a particular case without being a formal party to the proceedings.  To these correspond, for the same year, 891 judgments on the merits. Moreover, cases held inadmissible 100 or struck out are not reported, which entails that a text-based predictive analysis of them is impossible.

101
It is important to keep this point in mind, since our analysis was solely performed on cases retrievable 102 through the electronic database of the court, HUDOC 3 . The cases analysed are thus the ones that have 103 already passed the first admissibility stage 4 , with the consequence that the Court decided on these cases' 104 merits under one of its formations.

Main Premise
Our main premise is that published judgments can be used to test the possibility of a     We create a data set 7 consisting of cases related to Articles 3, 6, and 8 of the Convention. We focus on these 199 three articles for two main reasons. First, these articles provided the most data we could automatically 200 scrape. Second, it is of crucial importance that there should be a sufficient number of cases available, 201 in order to test the models. Cases from the selected articles fulfilled both criteria.    The models are trained and tested by applying a stratified 10-fold cross validation, which uses a held-259 out 10% of the data at each stage to measure predictive performance. The linear SVM has a regularisation 260 parameter of the error term C, which is tuned using grid-search. For Articles 6 and 8, we use the Article 3 261 data for parameter tuning, while for Article 3 we use Article 8.

263
Predictive Accuracy 264 We compute the predictive performance of both sets of features on the classification of the ECtHR cases. Performance is computed as the mean accuracy obtained by 10-fold cross-validation. Accuracy is computed as follows:  Table 2 shows the accuracy of each set of features across articles using a linear SVM. The rightmost 268 column also shows the mean accuracy across the three articles. In general, both N-gram and topic features 269 achieve good predictive performance. Our main observation is that both language use and topicality are 270 important factors that appear to stand as reliable proxies of judicial decisions. Therefore, we take a further 271 look into the models by attempting to interpret the differences in accuracy. 272 We observe that 'Circumstances' is the best subsection to predict the decisions for cases in Articles 6 cases that the Court deems inadmissible, concluding to a judgment of non-violation. In these cases, the 293 judgment of the Court is more summary than in others. 294 We also observe that the predictive accuracy is high for all the Articles when using the 'Topics' as

304
The consistently more robust predictive accuracy of the 'Circumstances' subsection suggests a strong 305 correlation between the facts of a case, as these are formulated by the Court in this subsection, and the Manuscript to be reviewed Computer Science These results could be understood as providing some evidence for judicial decision-making approaches 312 according to which judges are primarily responsive to non-legal, rather than to legal, reasons when they 313 decide appellate cases. Without going into details with respect to a particularly complicated debate that 314 is out of the scope of this paper, we may here simplify by observing that since the beginning of the 315 20th century, there has been a major contention between two opposing ways of making sense of judicial 316 decision-making: legal formalism and legal realism (Posner, 1986;Tamanaha, 2009;Leiter, 2010 legal rules or use more complex legal reasoning than deduction whenever legal rules are insufficient to 320 warrant a particular outcome (Pound, 1908;Kennedy, 1973;Grey, 1983;Pildes, 1999   First, topic 13 in Table 3  independently well-established trends in the case law without recourse to expert legal/doctrinal analysis.

380
The above observations require to be understood in a more mitigated way with respect to a (small) 381 number of topics. For instance, most representative cases for topic 8 in Table 3 were not particularly 382 informative. This is because these were cases involving a person's death, in which claims of violations of 383 Article 3 (inhuman and degrading treatment) were only subsidiary: this means that the claims were mainly 384 about Article 2, which protects the right to life. In these cases, the absence of a violation, even if correctly We presented the first systematic study on predicting judicial decisions of the European Court of Human

394
Rights using only the textual information extracted from relevant sections of ECtHR judgments. We 395 framed this task as a binary classification problem, where the training data consists of textual features 396 extracted from given cases and the output is the actual decision made by the judges.

397
Apart from the strong predictive performance that our statistical NLP framework achieved, we Finally, we believe that our study opens up avenues for future work, using different kinds of data (e.g.