The German Federal Courts Dataset 1950–2019: From Paper Archives to Linked Open Data

Various reasons explain why Europe lags behind the United States in empirical legal studies. One of them is a scarcity of available data on judicial decision making, even at the highest levels of adjudication. By institutional design, civil‐law judges have lower personal profiles than their common‐law counterparts. Hence very few empirical data are available on how courts are composed and how that composition changes over time. The present project remedies that by easing access to such data and lowering the threshold for empirical studies on judicial behavior. This paper introduces the German Federal Courts Dataset (GFCD) as a resource for empirical legal scholars, with the objective of inspiring more European lawyers to engage with empirical aspects of civil‐law adjudication. To that end, several thousand pages of German court documentation were digitized, transcribed into machine‐readable tables (ready to be imported into statistics software), and published online (www.richter-im-internet.de). To simultaneously explore innovative ways of sharing public‐domain datasets, the data were modeled as linked open data and imported into the Wikidata repository for use in any computational application.

This renewed interest requires data on the internal structures of the judiciary; yet, there is no requirement in Europe for courts even to publish their decisions, let alone acknowledge and systematically document the judges sitting on the bench. So European researchers are left "largely in the dark" (Hönnige & Gschwend 2010:513) about the inner workings of even top-tier courts. This scarcity of data not only impinges on people's fundamental "Right of Public Access to Legal Information" (Mitee 2017) and hampers research within civil-law jurisdictions, but also precludes research comparing the court systems of common-and civil-law jurisdictions.
To overcome these limitations and provide a resource for empirical legal studies in and on Europe, this article introduces the German Federal Courts Dataset (GFCD) with data and code book available from www.richter-im-internet.de. This is a public "multi-user dataset" (in the spirit of Epstein & Martin 2014:15) on the composition of the federal judiciary in Germany, as a major civil-law country, over the last 70 years. By releasing documents and court data from as far back as the German federal courts' (re-)establishment after World War II, it enables researchers to study some of the inner workings of civil law adjudication. The preprocessing that went into creating this resource was described in detail in two German reports (Hamann 2017;Hamann & Nest 2018), so the present paper will primarily summarize its scope and user interfaces (Section I.D), after familiarizing readers with its context (Section I.B) and previous literature (Section I.C).

B. Background: The German Federal Judiciary
Germany has seven federal courts that are regionally dispersed, but centralized in their respective subject matter authority. Consider Table 1: each court's jurisdiction extends to one area of law (e.g., labor law, tax law, administrative law) in which it will hear all cases escalated up by lower (state) courts. Civil and criminal cases are heard by the same court (BGH), but they are separated in Table 1 for convenience.
Unlike most state courts in Germany, federal courts usually decide through judicial panels, which are referred to as "senates" (but more properly translated as "divisions," following Siehr 1977;Meador 1981:n. 25) and consist of five or more judges. Each division has jurisdiction over a specified set of subject matters (Zuständigkeiten, competencies). To ensure a transparent rule of law, comprehensive lists of these competencies are published, along with the division's complete staffing (Besetzung, lineup), at the beginning of each calendar year.
Since case allocation plans provide data on the composition and tasks of each court division, they can be used to enrich other datasets where court divisions serve as the unit of analysis, including collections (corpora) of case decisions. To connect case corpora with the information contained in case allocation plans, researchers use the "docket identifier" (Aktenzeichen) of a given case to determine to which division the case was assigned. Courts assign this identifier following a countrywide uniform syntax that was standardized by a 1934 Prussian ordinance (Aktenordnung). Every docket identifier encodes (at least) four pieces of information, from left to right: 1. The court division to which the case is assigned, as an Arabic or Roman numeral. 2. The subject matter or case type, as a string of mostly uppercase letters. This varies widely between branch, hierarchy level, and location of the court. For instance, "C" could denote one of five case types, depending on which court assigns this string. 3. A running count that starts with 1 each year and increases for every new case. 4. A slash (/) followed by the two-digit year (YY) in which the case came to court.  (2019) total of court divisions, "Docket identifier" is for a randomly selected case from recent years; "Div." denotes for this particular case the court division to which it has been assigned.

The German Federal Courts Dataset 1950-2019 673
Consider the docket identifier examples in Table 1: they all consist of these four parts (except in the case of BSG, which adds additional letters to the beginning and end). The fourth part reveals that all cases were brought in 2015 or 2016, the first part corresponds to the division identified in the last column of Table 1. The strings in the middle part can be interpreted using coding schemes explained on each court's website, or in comprehensive reference guides such as www. gerichtsaktenzeichen.de.

C. Literature: Case Allocation Plans in Previous Research
The mechanisms of subject matter organization in the German judiciary have long been of interest to comparative law scholars in the United States. Case allocation plans were extensively described and analyzed over the last 40 years in U.S. legal literature (Siehr 1977:672;Meador 1981:44-49;Clark 1988Clark :1838Dammann 2004:539;Lichtmanegger 2019), with at least one such plan fully translated and appended to a thorough analysis of "The German Design" of judicial subject matter organization (Jewell 1981).
From this longstanding doctrinal interest in European judicial organization, attention has more recently shifted to the potential of case allocation plans to serve empirical research. A number of social scientists have demanded and used case allocation plans to study the behavioral mechanisms of judicial decision making in Germany: Hönnige and Gschwend (2010) reviewed existing theoretical and empirical literature on the Federal Constitutional Court (BVerfG), identifying a lack of empirical research on the court's institutional structures, the actors involved, and the different stages of their decision-making procedures. Both existing research approaches (which the authors labeled "interpretative sociological" and "rational choice" institutionalism) were found to suffer from a lack of data: "Our knowledge of the Federal Constitutional Court is still limited, as Christian van Ooyen noted. So far there are no systematic biographical overviews … [and] while the judicial staffing of the court's last 25 years can still be reconstructed from freely accessible sources such as quality press media …, its first quarter-century is still largely in the dark" (2008: 513). This call for datasets was expressly taken as an inspiration for creating the German Federal Courts Dataset (see Hamann 2017:n. 2; www.richter-im-internet.de/edition). Gschwend et al. (2016) studied whether Constitutional Court judges are "political animals" by using random shocks to the bench composition as quasi-experiments: in cases where judges drop out from their respective division due to illness or other chance events, the rules of procedure require them to be replaced by judges from another division. By coding for both which judges actually signed decisions between 1998 and 2011 and which judges were responsible according to the original case allocation, Gschwend et al. identified quasi-random variations that they interpreted as causal for the division's decision making. At the time of this study, the German Federal Courts Dataset had not yet been released, but the study's first author oversaw the dataset's creation as a member of its academic supervisory board (listed at www.richter-im-internet.de/edition), so that follow-up studies are freed from recoding the 13 years of procedural rules, and could potentially extend the period under consideration. Stern (2016) explored methods to measure the policy preferences of all judges at the German Federal Court of Justice since 1950. The author used various sources (some of the same ones on which the present project relies), and refers to 48 case allocation plans in order to identify the 540 judges then sitting on the Court of Justice. He then developed preference scores from the observable political composition of the committee through which each judge had been elected, in order to derive the judge's own policy preference, assuming it stably to represent the electing actors' preferences. Many materials needed for this analysis are now integrated in the present dataset.
Wittig (2016) analyzed, using a hand-collected dataset, the occurrence of dissenting votes at the German Constitutional Court. In order to analyze judges' votes and instances of dissent empirically, the author required data on the judge rapporteur in each of 5,290 cases in her dataset. She thus turned to case allocation plans: "Knowing what area a case belongs to in combination with the case allocation plan enables us to identify who served as rapporteur in a certain case." (Wittig 2016:26) Wittig's identification strategy consisted of two steps, both of which involved case allocation plans: first, she "coded the issue area for each case based on the areas the case allocation plan provides," then she proceeded by "matching the issue area with the information from the case allocation plan … to specify the rapporteur" (Wittig 2016:46).
Wittig was also the first to provide a meaningful robustness check: in order to validate the matchings she obtained, she double-checked them against information "from the Court as well as press releases and other sources that occasionally name the rapporteur," which "showed that our strategy was quite reliable" (Wittig 2016:46). This demonstrated how case allocation plans are useful for empirical research, but it also confirmed the previously noted lack of data availability: "case allocation plans are publicly accessible only from 1993 on … [and] even at the Court, the case allocation plans do not exist completely" (Wittig 2016:26). Shortly after this assessment was written, the German Federal Courts Dataset was released in early 2017, including Wittig's data, which the author had kindly provided upon request. Swalve (2019) was the first to use the new dataset: he analyzed German Federal Court of Justice (BGH) decisions to find out how collegial decision making affects judicial deliberation. Using a similar identification strategy as Gschwend et al. (2016), Swalve builds on anecdotal evidence that some court divisions occasionally face staff shortages and end up with newcomer or "outsider" judges stepping in to fill vacancies. These judges tend to be unfamiliar with the division's decision-making culture, so the study analyzes the effect that such quasi-random shocks had on the decision-making mode by which the division disposed of its case load (either unanimous court order or deliberated judgment). To identify "outsider" judges, especially in the years prior to 2005, the author used the German Federal Courts Dataset, which enabled him to classify individual judges as "inside" or "outside" any given court division.

D. Scope of the German Federal Courts Dataset
The German Federal Courts Dataset consists of two components: an online repository of PDF files that provides convenient access to well over 3,000 pages of previously inaccessible archival documents, which were digitized and long-term archived in digital form (docuset; see Section II), and a machine-readable rendition of a part of these PDF documents (dataset), which was published online in spreadsheet format (see Section III), and as linked open data within an established data repository (see Section IV). The latter is a novel and unique strategy in empirical legal studies that seeks to explore new ways of providing access to judicial open data. This may serve both as a pilot for further data extraction from the docuset, and as a role model for future dataset-generating projects, which are considered one of three pillars of empirical legal studies more generally (Epstein & Martin 2014:15).

II. Docuset: Turning Paper Archives into PDFs
As explained in Section I, the seven federal courts of Germany document their judicial staff lineup and their judges' subject matter competencies in case allocation plans, which form the basis of the present collection of digitized documents (docuset).

A. Data Sources
Over the course of the last 70 years, case allocation plans were maintained by court officials in internal paper records and published mainly in the Federal Announcements Gazette (BundesÁanzeiger). This happened in three different formats: 1. From the founding era of the German federal courts in the 1950s until 1966, allocation plans were interspersed with other government announcements and published in the newspaper-sized gazette in a six-column broadsheet format on newsprint. 2. From 1967 until 2012, the allocation plans of five federal courts (the two others joined in 1994 and 1996) were collated and jointly published in an annual supplement to the gazette, that is, an A4-sized booklet that was printed and distributed alongside the regular (newspaper-sized) gazette issues. 3. Since 2013, the gazette has been published online, so its contents-including allocation plans-is available in PDF format from www.bundesanzeiger.de.
Digitizing the pre-2013 documents was challenging for various reasons. First, the gazette of earlier years is hard to come by: many libraries store these issues in underground or external stacks, available only on individual order. This was a problem regarding the 1950-1966 issues, for which one cannot anticipate the precise location of allocation plans within the running text (only one federal court, the BGH, maintains a list of citations). Second, all publicly preserved copies of the gazette are bound in large archival volumes of thousands of pages each, which would have to be damaged to capture allocation plans without a deep centerfold distorting the image. Third, the broadsheet format of the gazette could not be captured with standard-sized scanning appliances anyway; this even held for the post-1967 supplement issues, which were bound in with the main issues in most libraries. (Despite extensive research, no library was found to retain loose copies.) Fourth, the thin newsprint paper had strongly decayed over several decades, as evidenced by two such allocation plans that were captured using a digital camera: see www.richter-im-internet. de/bgh/1951 and …/1952. To obtain better-quality copies of the allocation plans that could be used for digitization and OCR, I contacted the private publishing house that distributes the gazette. Its internal archive department had preserved loose copies of the post-1967 supplement issues for all years except 1976, meaning that 98 percent (261 out of 266) of the 1967-2012 allocation plans could be obtained from the publisher for a fee. The five missing plans were digitized (at lower quality) from a library holding.
Regarding the pre-1967 plans, digitizing them from the gazette had turned out to be prohibitively difficult, so I contacted all courts directly and inquired about their file record holdings. Two of the seven courts (BAG and BGH) were willing and able to furnish typewritten allocation plans directly from their court records, while the other five courts denied having even retained files from their early years. 1 Another eight allocation plans for the Constitutional Court were provided by a member of the project's academic supervisory board, who had previously collaborated on a project funded by the German National Science Foundation (DFG), "The Federal Constitutional Court as a Veto Player" (Hönnige & Gschwend 2010; www.ccdb.eu).

B. Data Processing and Publication
A total of 1,336 pages were digitized from the gazette archives and processed using optical character recognition (OCR); another 3,340 pages were digitized by the two courts from their paper files, and subsequently OCRed by myself; 36 further pages were obtained for the Constitutional Court. In years for which two versions of an allocation plan had been obtained (one from the gazette, one from the court directly), the gazette version was preferred for reasons of improved legibility.
All the resulting 3,169 digitized pages across 352 different documents were saved in portable document format (PDF), with appropriate meta-data added to the file headers. These meta-data included a note on each file's free reusability (so-called CC0 license) resulting from the fact that case allocation plans are in the public domain as "official works" under Section 5(1) of the German Copyright Act (UrheberÁrechtsÁgesetz). All PDF files ranged in size from 42 kB to 14.7 MB, with a total file volume of 572.7 MB. For a detailed breakdown, see Table 2.
To disseminate these files effectively, and to archive them permanently for future research use, the entire docuset was stored in a public repository. As the most suitable outlet, I identified a re3data.org research data repository funded by the DFG (dx.doi. org/10.17616/R37M1J): the <intR> 2 Dok repository, short for "international interdisciplinary research document repository of legal studies" (intr2dok.vifa-recht.de). Upon being entered into this repository, each case allocation plan was assigned a persistent Additionally, the archived docuset was offered in an accessible and easy-to-use frontend in the form of a newly created web portal, www.richter-im-internet.de (Richter im Internet, i.e., Judges on the Web). This interface allows users easily to view and download each of the documents without even visiting the data repository.

C. Summary
As a result of the described process, 352 separate allocation plans of 3-31 pages each (nine on average) were eventually digitized and published online as PDF files. This covered more than three-quarters of all allocation plans by the seven federal courts of Germany since their respective establishment.

III. Dataset: Turning PDFs into Spreadsheets
While the docuset provides convenient access for humans to read and extract information from the allocation plans, statistical analyses also require machine-readable data that can readily be imported into software packages. As outspoken empiricists argue, one of the "research objectives" of empirical legal studies is "the creation of public multi-user datasets" (Epstein & Martin 2014:15). Thus, raw data had to be extracted from the docuset (Section III.A), curated, and purified (Section III.B). The dataset thus obtained (Section III.C) will realize its full potential once combined with other data sources (Section III.D). NOTE: A court's establishment year was counted as 1 if establishment was no later than July, and 0 otherwise. Plans in the shaded row were subsequently used to create the LOD dataset described in the next sections.

A. Raw Data Extraction
Extracting data from digitized allocation plans was a labor-intensive manual task that had to be restricted to only a part of the docuset in the funding phase of the project. The extraction was thus limited to the Federal Court for Civil Law, which is part of the most prominent and most prolific top-tier court in Germany (BGH)-one of only two for which the docuset was complete for the entire duration of the court's existence. Staff lineups were extracted from the tabulations in these PDF files. Research assistants manually transcribed these into an Excel sheet with one row for each judge in a given year, and one column for each of the following data contained in the PDF tables: year, identifier (number) of the judge's division, his or her function in that division (presiding, deputy presiding, or assistant), the gender indicated by the grammatical inflection of that function, the judge's academic degrees (e.g., Dr., Prof., etc.), his or her name prefix (e.g., nobility indicators), surname, and first name, as well as any comments contained in the allocation plan (usually about secondary positions held by a judge, or about interim allocations for when his or her tenure commenced or expired during the calendar year).
The transcription assistants were free to copy and paste data from the OCRed PDF, but urged to verify textual consistency carefully. No modifications whatsoever to the original text were allowed at this point. The objective was to obtain a fully authentic Excel transcript (LeseÁfassung) of the PDF table. Wherever assistants noted errors or inconsistencies in the originals, they pointed them out in an additional "editorial remarks" column.
To validate the assistants' output, I repeatedly sampled documents and compared them with the transcription. Very few deviations appeared, and these were discussed with the assistants. As further sampling did not reveal additional deviations, confidence in the transcription procedure seemed warranted.

B. Data Integrity and Completeness Checks
As the manual process of data extraction concluded, two new challenges surfaced that needed to be addressed: 1. A surprising number of mistakes and inconsistencies in the source material had accumulated in the "editorial remarks" column. For instance, one allocation plan misstated its year, another dropped a diacritic letter from a judge's surname, and several judges had been assigned inconsistent gender labels across different court years. 2. The source material generally did not document judges' first names, and only used first-name initials where required for disambiguation. This made post-hoc identification of individual judges challenging, and even a count of just how many individuals had been identified could not reliably be obtained at first.

The German Federal Courts Dataset 1950-2019 679
Solving these issues required additional data. I turned to one of the commemorative volumes (FestÁschrift) that federal courts publish on important occasions, this one on the 50th anniversary of the Federal Court of Justice in 2000 (Geiß et al. 2000). This book contains an appendix detailing which judges worked for the court in the past (2000:787-832). Unfortunately, this appendix was neither available online, nor were the authors or the publisher able to provide it on request. It was thus redigitized, OCRed, and transcribed by research assistants in exactly the same way as the allocation plans had been. The table contains variables such as each judge's academic degree, date and place of birth, last public office before entering the court, 2 dates of entering the court, and (where applicable) of ascension to division president or (vice) court president, as well as exit dates and reasons for leaving the court, plus (where applicable) date of death. The additional "editorial remarks" column that we added during transcription pointed out some name typos, a wrong birth year, an impossible exit date (February 29, 1974), a nonexistent birth place in East Germany, and a date of death that had accidentally been appended to the wrong person on the line below. These mistakes were quickly identified and easily corrected.
Subsequently, both transcribed tables were matched on judges' names as well as their years of tenure at the court. This revealed three further discrepancies (two inconsistently spelled names, and a judge listed as active even after retirement) that were resolved using online sources, and helpful advice from the current Federal Court staff. Upon correcting both transcribed tables, matching the through Stata's merge command finally resolved the inconsistencies discovered in the allocation plans, and enabled us to identify judges' first names so they could be unambiguously distinguished. It turned out that the allocation plans of 1950-2018 had documented 433 civil law judges, with 392 unique and 18 ambiguous surnames (most prevalent were Schmidt, Fischer, Meyer, Müller, and Schneider, with three appearances each).
I then went on to convert the staff allocation tables into a machine-readable dataset with binarized indicator variables (dummies) in place of its previously categorical variables (gender, doctoral degree, presiding judge, etc.).

C. Summary
Data extraction and refinement resulted in 5,288 data entries for the allocation plans, each coding for one judge-year from 1950 through 2018. I published two Excel files (gvp. xls for the allocation plans, fs50.xls for the FestÁschrift appendix), each with a sheet explaining the codebook and two additional sheets containing the data (see Table 3).
One sheet contained the transcribed LeseÁfassung version, the other the machinereadable rendition with indicator variables and a new column for unique Wikidata QIDs (see Section IV), as needed to match datasets. The machine-readable sheets were additionally rendered as separate .csv files (comma separated values) in Unicode format (UTF8) to ensure cross-platform compatibility with any statistics package.

2
Of the listed judges, 79.9 percent had previously been judges at a state court, 11.6 percent were government or ministry officials, 4.6 percent public prosecutors, 3.2 percent private attorneys, and 0.7 percent legal academics.

D. Usage Potential
The dataset was not designed to address a single research question, but to serve as a community infrastructure and leverage the "combinatoric advantage" of multidimensional data collections (Epstein & Martin 2014:15-16). To illustrate this advantage and preliminarily explore potential uses of the dataset, let us first turn to descriptive statistics. Figure 1 displays the proportion of female civil law judges since the Federal Court's establishment in 1950. We see a largely stagnant level at or below 5 percent until the mid-1970s, when the number started increasing, and (after a brief recession in the late 1980s) a steady increase in the female proportion starting around German Reunification (1989Reunification ( /1990. As a stylized fact, then, gender parity is on the rise, increasing by about 1 percent each year since 1990-a trend that, if continued, will see the Federal Court at a 50-50 gender ratio within the next 15 years. A similarly interesting time trend may be found in Figure 2. Figure 2 displays the proportion of judges with a professorship, which is one of the academic degrees reported in the original docuset files (see Section III.A). Such professorships come in two varieties: some are professional (meaning that a full-time academic  is elected to the Supreme Court and continues to work as a judge, with or without teaching and research commitments on the side), most others are titular (meaning that an academically trained judge teaches at a given university so regularly that the university's professoriate confers an honorary professorship upon her). Either way, these "professorial judges" are presumably more research inclined and (possibly) more methodical than the average judge without such degree. Over time, their distribution plausibly follows a U-curve: from an all-time high level of professorial judges at the court's establishment, when (just after the fall of Nazism) Germany lacked professional judges who were both experienced and politically untainted, to an all-time low in the anti-intellectual, anti-elitist years around 1968, to a slow but steady increase since the early 1970s that has resulted in roughly every seventh judge holding a professorship. This development has, to my knowledge, not been described or reflected on in previous literature, but it may warrant additional analyses, possibly along the lines of the illustration below.
As informative as such descriptive analyses within the dataset may be, they do not exhaust its potential, which mostly lies in being used across data sources. That is, the abovecited "combinatoric advantage" grows as datasets are designed for matchability so they can easily be combined with other available data. To test and showcase one such example of data-matching, I used an already existing dataset on the German judiciary: the CAL 2 Corpus of German Law. This dataset was developed and curated by a research group on Computer-Assisted Legal Linguistics (CAL 2 ) at the Heidelberg Academy of Sciences in Germany (see Vogel et al. :1355. It presently contains some 375,000 original documents (1.3 billion tokens) from three legal domains (judiciary, academia, legislation) in plain-text digital format with various layers of linguistic annotation (Vogel & Hamann 2018:323). To see whether it might be matched with the present dataset, I obtained all 14,744 rulings by the German Federal Court of Civil Law after 2000 from the CAL 2 corpus.

Hamann
Using the docket identifier as described above (Section I.B), I extracted for each decision the court division that disposed of it, using this information to match the two datasets.
The resulting combined dataset is too rich to be fully explored here. I will merely venture to suggest, as a "proof of concept," one potential research question that the dataset could help to address: What impact do professorial judges have on the use of conventional legal methodology? Do they rely on different canons of construction than less-academically-inclined judges? Regarding the independent variable in that question, I created an indicator variable for whether a court division in a given year did or did not contain at least one professorial judge. As the dependent variable, I considered the canons of construction in German law, and chose three that stand out across legal orders (I labeled them "literalist," "historicist," and "purposivist"); others could, of course, be considered in future research.
For each of these three canons, I coded indicator variables for their most prominent linguistic markers. For example, when a decision referred to Wortlaut* or grammat* (i.e., "statutory language") it was coded as "literalist," when it referred to Geschicht* or histor* (i.e., "historical" terms) it was coded as "historicist," and when it referred to Sinn und Zweck or teleol* (i.e., objectives or "teleology") it was coded as "purposivist." Since these canons are not mutually exclusive, each decision could refer to more than one canon, or none. 3 Figure 3 summarizes the result of this exploration. It shows the percentage of all 14,744 decisions that referred to one of the three canons, separated by whether the authoring court division featured at least one professorial judge or not. We observe, first, that purposivist interpretation has long been much more prominent than either literalist or historicist canons. Second, there are indications of a developing divergence in recent years between professorial judges who rely a little less on purposivist canons, and nonprofessorial judges who seem increasingly to rely on them. Unfortunately, the time series ends in 2015 (marking the current data horizon of the CAL 2 Corpus of German Law), so it remains unclear whether this gap is an outlier or a continuing trend. It seems interesting enough, however, to continue exploring as additional data become available, and to develop causal models for the trend.
As (merely) a proof of concept, the present exploration demonstrates how the German Federal Courts Dataset might be fruitfully combined with other datasets to analyze determinants of judicial decision making in Germany, which another recent study exemplified even more artfully (Swalve 2019; see Section I.C).

IV. Linked Open Data: Conversion to Repository Items
To enable other researchers to pick up on such observations and match the Federal Courts Dataset with their own research data, we eventually converted the dataset into linked open data (LOD), which were then imported into a data repository. To do this, additional data were harvested from Wikipedia: its "List of Judges at the German Federal Court of Justice" (de.wikipedia.org/wiki/Liste_der_Richter_am_ Bundesgerichtshof) contains, in tabulated plain-text format, names of judges, their birth years, and significant events of their tenure at the court. From this list, we were able to extend the dataset to another 159 criminal law judges. Finally, we added the latest announced court hires, which had not been included in other lists, and ended up with the comprehensive set of 597 judges who have ever served at the German Federal Court of Justice.

A. Repository Selection and Import
To deposit these data in an appropriate repository, we relied on previous studies that had compared various LOD databases and concluded that the Wikimedia foundation's "Wikidata" is among the most flexible (Färber et al. 2016) and most widely used, with Google's Freebase completely integrated since 2015, as well as various cultural institutions-such as the BBC and MOMA-using the repository to deposit or retrieve data (see Wikidata 2018). This meant that using Wikidata would provide the Federal Courts Dataset with a visible and versatile interface for researchers and other users interested in open government data.
To familiarize readers with the Wikidata ontology, I briefly explain its data model. Each entity is modeled as an "item" identified by a "QID." Each item is characterized by indefinitely many "statements" that each consist of a "property," a "value," and a limited set of "qualifiers." To illustrate this with a concrete example, consider the item Q32961325 (wikidata.org/wiki/Q32961325): its first "statement" consists of the property "P31," which translates to "instance of," and the value "Q1172284," which translates to "dataset," and is qualified by "P642: Q16533" (which translates to "of: judges"). Apart from these language-independent statements, which interlink to form the knowledge graph of the database, each item can hold up to three pieces of language-specific information: label, description, and alias. In the case of Q32961325, the label reads "German Federal Courts Dataset" in English, but "Richter-im-Internet. de" in German. Judges would be similarly modeled as items, and linked back to Q32961325, which serves as a quotation source (P248) for each statement imported into the repository.
The actual import utilized the pywikibot framework on Wikidata's MediaWiki interface (API). During this process, 352 judges (59 percent) were found already to possess a Wikidata QID, which seems rather low compared to the U.S. Supreme Court, for which the coverage may be much closer to 100 percent. This meant that 245 new Wikidata items had first to be created before the import procedure added statements, as far as necessary, to both the new and old items. These statements included, for each judge, "instance of" ("human"), their gender and nationality, their first, last, and full name, as well as their academic degree, place and date of birth and date of death, occupation ("judge"), employer ("Federal Court of Justice of Germany"), work location ("Karlsruhe"), and positions within the court and within the judge's respective division (assistant/deputy presiding/presiding judge). Statements about the employer and the positions of judges were qualified by adding start and end times.
Apart from these language-independent statements, the judge's full name was deposited as an item label in various languages, along with a description ("German judge, Federal Court of Justice") and its translation into seven other languages: Dutch, French, German, Italian, Mandarin, Polish, and Spanish.

B. Data Access and User Interfaces
Wikidata now documents, for the first time, all 597 judges at the Federal Court of Justice, with rich additional data on each of them, carefully curated and cross-validated. These data can be accessed by researchers in several ways: 1.Each item can be viewed by manually appending its QID to the Wikidata URL, as in wikidata.org/wiki/Q1510984 to access data on the court's first female judge.
2.Items can be downloaded in bulk by using Python to dump data, for example, in JavaScript Object Notation (JSON) format; requisite scripts are freely available at the GitHub repository www.github.com/FUB-HCC/wikidata_bot.
3.Items can be queried on the fly using the graph-based SPARQL query language endpoint (query.wikidata.org). This allows for much more complex search patterns than conventional full-text search. As an illustration, see the following

V. Conclusion
The German Federal Courts Dataset (www.richter-im-internet.de) consists of a docuset of 3,000+ pages, as well as a dataset of some 6,000 spreadsheet entries, which have simultaneously been modeled as linked open data that are accessible via wikidata.org. This presents a rich new resource for empirical legal scholars that is accessible through various interfaces. Its potential can be gleaned from cursory explorations, and will be fully utilized by matching this dataset with other data on judicial decision making.
This will allow European legal scholars to study adjudication more systematically and engage in transatlantic comparative projects. Policymakers have clearly signaled their interest in such research by declaring the Federal Courts Dataset one of Germany's 17 scientific "Landmark Ideas" of 2017 (see richter-im-internet.de/ldi2017.pdf). It is now up to the research community jointly to push the frontiers of our knowledge about European adjudication.  NOTE: Query to be replicated at https://w.wiki/5Ey, courtesy of Marisa Nest.