Images of Roman Imperial Denarii: A Curated Data Set for the Evaluation of Computer Vision Algorithms Applied to Ancient Numismatics, and an Overview of Challenges in the Field

: Automatic ancient Roman coin analysis only recently emerged as a topic of computer science research. Nevertheless, owing to its ever-increasing popularity, the field is already reaching a certain degree of maturity, as witnessed by a substantial publication output in the last decade. At the same time, it is becoming evident that research progress is being limited by a somewhat veering direction of effort and the lack of a coherent framework which facilitates the acquisition and dissemination of robust, repeatable, and rigorous evidence. Thus, in the present article, we seek to address several associated challenges. To start with, (i) we provide a first overview and discussion of different challenges in the field, some of which have been scarcely investigated to date, and others which have hitherto been unrecognized and unaddressed. Secondly, (ii) we introduce the first data set, carefully curated and collected for the purpose of facilitating methodological evaluation of algorithms and, specifically, the effects of coin preservation grades on the performance of automatic methods. Indeed, until now, only one published work at all recognized the need for this kind of analysis, which, to any numismatist, would be a trivially obvious fact. We also discuss a wide range of considerations which had to be taken into account in collecting this corpus, explain our decisions, and describe its content in detail. Briefly, the data set comprises 100 different coin issues, all with multiple examples in Fine, Very Fine, and Extremely Fine conditions, giving a total of over 650 different specimens. These correspond to 44 issuing authorities and span the time period of approximately 300 years (from 27 BC until 244 AD). In summary, the present article should be an invaluable resource to researchers in the field, and we encourage the community to adopt the collected corpus, freely available for research purposes, as a standard evaluation benchmark.


Introduction
It is no longer an exaggeration to say that computer vision is pervasive in everyday life: face detection [1,2] has been a standard feature of digital cameras and smartphones for well over a decade, online image depositories are increasingly successful at categorizing images by their semantic content (scene: beach, city, countryside, etc.; objects: cars, buildings, dogs, churches, statues, etc.) [3][4][5], automatic diagnosis and prognosis of diseases has even surpassed the performance of human experts in some domains [6][7][8], etc. This success, coupled with the increasing pervasiveness of powerful computing devices and the dramatic improvement in the user-friendliness of technology in general, is having a positive impact on inter-disciplinary research, with a growing interest in the application of modern computer science in other scientific fields, as well as in the arts and humanities [9][10][11].
A particularly interesting domain of application concerns ancient numismatics, i.e., the study of ancient currency, which has been attracting an increasing amount of attention from the computer vision community. The focus of the present article is on a number of mainly methodological issues that are important in this increasingly prolific research area, which we argue have received insufficient attention in the published literature to date. To understand our contributions, it is necessary to introduce some basic numismatic terminology, which we do next.

Computer Vision and Machine Learning Challenges within the Domain of Ancient Numismatics
We begin this section with an explanation of the relevant numismatic terminology necessary for the understanding of the present article and the related literature, then categorize and describe in detail the most important (practically and technically) challenges in the field, and summarize the progress to date in addressing these.

Terminology
The specialist vocabulary of numismatics is extremely rich, and its comprehensive review is beyond the scope of the present article [12]. Herein, we introduce a few basic concepts that are important for the understanding of the present contribution and related works.
Firstly, when referring to a 'coin', the reference is being made to a specific object, a physical artifact. It is important not to confuse it with the concept of a (coin) 'issue', which is more abstract in nature [13]. Two coins are of the same issue if the semantic content of their obverses and reverses (heads and tails in modern, colloquial English) is the same. For example, if the obverses show individuals (e.g., emperors), they have to be the same individuals, be shown from the same angle, have identical headwear (none, crown, wreath, etc.), be wearing the same clothing (drapery, cuirass, etc.), and so on. Moreover, any inscriptions, usually running along the coin edge (referred to as the 'legend'), also have to be identical, though not necessarily be identically arranged spatially letter by letter [14]. Online Coins of the Roman Empire (OCRE; see http://numismatics.org/ocre/), a joint project of the American Numismatic Society and the Institute for the Study of the Ancient World at New York University, lists 43,000 published issues. The true count is likely to be even greater.

Grading
An important consideration in numismatics regards the condition of a particular coin. As objects that are a millennium and a half to three millennia old, it is unsurprising that, in virtually all cases, they have suffered damage. This damage was effected by a variety of causes. First and foremost, as most coins were used for day-to-day transactions, damage came through proverbial wear and tear. Damage was also effected by the environment in which coins were stored, hidden, or lost, before being found or excavated-for example, the moisture or acidity of soil can have significant effects. Others were intentionally modified, for example, for use in decorative jewellery.
The amount of damage to a coin is of major significance both to academic and hobby numismatists. To the former, the completeness of available information on rare coins is inherently valuable, but equally, when damaged, the type of damage sustained by a coin can provide contextual information of the sort discussed earlier. For hobby numismatists, the significance of damage is twofold. Firstly, a better-preserved coin is simply more aesthetically pleasing. Secondly, the price of the coin, and thus its affordability as well as its investment potential, are greatly affected: the cost of the same issue can vary by 1-2 orders of magnitude.
To characterize the amount of damage to a coin due to wear and tear, as the most common type of damage, a quasi-objective grading system is widely used. Fair (Fr) condition describes a coin so worn that even the largest major elements are mostly destroyed, making even a broad categorization of the coin difficult. Coins of Very Good (VG) grade have most detail worn nearly smooth around the central areas but still visible on the periphery. Fine (F) condition coins show significant wear with many minor details worn through, but the major elements are still clear at all of the highest surfaces. Very Fine (VF) coins show wear to minor details, but clear major design elements. Finally, Extremely Fine (XF) coins show only minor wear to the finest details. Examples are shown in Figure 1.

Practical Applications
One of the features of numismatics which makes it an interesting domain for the application of computer vision and machine learning lies in the number and diversity of specific problems that it presents. Many of these directly correspond to challenges faced by experts or hobby collectors, though some new work introduces innovative challenges which are only possible with the use of technology (we shall elaborate on this shortly). The key problems, few of which can be considered anywhere near solved, include the following: • Specimen matching; • Issue matching [15][16][17]; • Denomination categorization; • Issuing authority recognition [18]; • Legend readout [14]; • Semantic analysis [19]; • Forgery recognition; • Die matching.
As implicitly explained in the previous section, specimen matching refers to the problem of determining if the same coin specimen in two images is the same, i.e., if they show the same actual physical artefact. There are several important applications of this task. For example, it can be used to determine the provenance of a specific coin or to track its value across time as it is sold and passed on from one collector onto another. Importantly, specimen matching can also be used to automatically monitor massive volumes of coins sold on non-traditional auction web sites, such as eBay, and to track stolen coins. The key challenges for specimen matching lie in differential appearance effected by different illumination conditions, camera settings (e.g., aperture, focus, and exposure), clutter, scale, and viewpoint [20].
In contrast to specimen matching, issue matching refers to the problem of determining if the coins shown in two images are of the same issue, i.e., if they contain the same semantic content and are of the same denomination (e.g., denarius, anotoninianus, follis, sestertius). This task is the first and probably the most commonly performed one by any numismatist; colloquially put, it answers the question "What is this coin I've got?". In addition to all of the aforementioned challenges outlined in the context of specimen matching, in issue matching, a major challenge of a semantic nature emerges: recall that issues are identified by the corresponding semantic contents, which can exhibit both stylistic variability (e.g., due to different die engravers), appearance change due to physical damage or chemicals in the environment, or die wear, to name but a few; see Figure 2. Recalling from the previous section that the number of different issues of Roman Imperial coins exceeds 43,000, it is not difficult to see why issue matching is inherently an extremely difficult problem [16]. In addition, such a high number of classes makes it all but practically impossible to obtain an annotated gallery of exemplars of all (or most) issues [19,21].
(a) (b) Figure 2. Reverses of two different specimens of the same issue-a silver (AR) denarius of Julia Maesa (RIC 249). Despite them being the same issue, the two specimens exhibit a series of appearance differences. These range from the arrangement of legend letters (e.g., note that the 'I' in 'FECVNDITAS' is to the left-as seen by a reader-of the goddess depicted on the specimen in (a) and to the right in (b)), the exact pose of the child next to the goddess (Fecunditas) or indeed the goddess herself, the flan shape and the centering of the motif within the flan, the damage and loss of fine detail, and the toning ('color' change).
Denomination categorization is a classification problem which, as the name suggests, is concerned with the determination of the denomination of a coin. Denarii, antoniniani, sestertii, ases, and dupondii are examples of the most common denominations of the Roman Imperial period before the economic crisis of the third century. Some of these are shown in Figure 3. The knowledge of a coin's denomination can be useful as a step aiding in issue matching or in its own right for monitoring market trends (types of coins being sold, price changes, etc.).
Most Roman imperial coins feature a portrait (all but universally in profile, and usually facing right). Most often, this is the current emperor, sometimes their predecessor (as commemoration following their death), and also frequently their spouse. The recognition of this individual is one of the first things that a numismatist will do in the process of identifying a coin, i.e., it is a step in the process of issue recognition. Within the scope of computer-vision-based analysis of ancient coins, issuing authority recognition started attracting attention following the realization that tackling issue recognition is a far more difficult challenge than anticipated at first. Hence, the attempts to apply generic object recognition algorithms waned in popularity, and instead, the focus shifted towards the use of more domain-specific knowledge, the recognition of the depicted person being an obvious choice. Thus, the challenge of legend readout concerns the recognition of the legend inscription. So far, it has received little attention from the computer vision community [14] despite its utility to numismatists. In large part, this is likely a consequence of the difficulty of the problem: legends are abounding in fine detail and are prone to damage, with letters easily confused with one another, or indeed a damaged letter with a legend break.  The legend on an ancient Roman imperial coin is an interesting semantic element. Some parts of it contain, in essence, the same information as the motif they encircle. For example, on the obverse, the legend almost invariably explicitly names the issuing authority shown on the coin-in Figure 4a, it begins with the 'AVRELIVS', which refers to Marcus Aurelius. Thus, this information can be used to aid in the process of issuing authority recognition or, with reference to the reverse, in the interpretation of the corresponding motif. However, the legend also contains some information which is generally not contained elsewhere. For example, the legend often contains the consular year of the issuing authority, such as 'COS III' (third consular year), which allows for the precise dating of the issue and its disambiguation from other issues otherwise identical to it.
We have already discussed issue matching as probably the most important and pervasive problem in automatic ancient coin analysis. A major and indeed fundamental problem with the existing approaches which rely on visual matching of images, as highlighted in Section 2.1, is that the number of classes in this classification problem is enormous, exceeding 43,000. This is not only a technical challenge, but also a practical one: it is virtually impossible to obtain gallery samples of such a high number of issues or indeed anything even close to that number. Yet, this was only recently explicitly recognized in the literature [19]. Thus, recently, an alternative approach was first put forward, as well as the first promising steps towards its implementation. The idea is very much akin to what a human numismatist does: interpret and understand the semantic content [22] of a coin (hence, semantic analysis), and then use this semantic description for matching against textual reference entries [23]. Thus, the visual matching problem is eventually turned into a text-matching one. This work is still in its early stages, but highly promising results have already been reported using a deep-learning-based framework capable of automatically learning salient concepts and the range of their artistic depiction variability [21]. Considering the size of the global ancient coin market, it is hardly surprising that it is an attractive target for fraud. Unlike most other ancient artifacts (e.g., highly ornate pottery, helmets and other armor, swords, etc.), for the most part, ancient coins are medium-value collectables. This makes it cost-ineffective to individually authenticate all but a small number of more expensive specimens. Yet, the high volume of sales makes forgery a lucrative business. Despite this major practical significance, interestingly, the task of automatic forgery detection has not been explored in any published work to date. What makes this observation even more surprising is that the problem is technically quite interesting. In particular, the novel challenge lies in the new kind of intra-class variability within the class of forgeries. This variability emerges as a consequence of different methods used to produce fake coins. While a thorough discussion of this topic is beyond the scope of the present article, the simple example in Figure 5 will serve to illustrate the gist of it. Specifically, compare an authentic example of a silver denarius of Clodius Albinus in Figure 5a with the three forgeries in Figure 5b-d. The first of the latter, in Figure 5b, is good in style, and was likely produced from a casting mold, itself made from an authentic specimen (as a 'negative' thereof). The lack of authenticity is given away by the casting sprue at 10-11 o'clock looking at the obverse, the relief pattern around the legend (especially on the reverse; it is highly unlike that of struck coins), and the surface of the coin (impressions of small casting bubbles). In contrast, the forgeries in Figure 5b,c are poor in style, mostly likely made from modern molds, and readily recognizable as being produced by casting and not striking. How this wide inter-class variability can be learned is an open question and arguably makes the problem one of novelty detection.
Recall that ancient Roman imperial coins were minted by striking a blank coin placed between hand-carved dies [24]. This is in contrast to casting, which was used briefly during the Republican period, as well as later in the production of medallions, probably due to their much larger size (often in excess of 50 g). Being able to tell if two coin specimens of the same issue were made using the same dies, i.e., die matching, is of much interest to research numismatists (and much less so to hobby collectors), because, for example, this allows for the inference of migratory patterns of peoples, trading routes, etc. Die matching can also assist in the fight against high-quality forgeries, some of which were struck in modern times but using ancient dies or copies thereof [25]. To the best of our knowledge, die matching remains an entirely unexplored challenge in the realm of automated ancient coin analysis. Though reasonably convincing at first sight, the forgery in (b) is in fact a cast (as witnessed by the sprue at 10-11 o'clock looking at the obverse, as well as the fine surface features). The forgery in (c) is poor in style and thus utterly unconvincing. The poor style (though less so than the previous example) and inappropriate metal composition, evident from the toning of the specimen, also make the forgery in (d) an unconvincing one.

Research Effort to Date
As noted in the previous section, most research on the application of computer vision and machine learning in the domain of ancient numismatics focused on the problem of issue recognition, or, more specifically, visual issue matching. Within this body of work, in terms of technical underpinnings, visual matching based on local features (chiefly SIFT [26]) dominates the literature [15,16,27]. Though highly successful in a wide variety of object recognition tasks [28][29][30][31], these approaches were quickly found to perform very poorly in the context of the problems of interest herein, showing some success only in highly controlled conditions, i.e., when changes in illumination are small or non-existent, when images are devoid of clutter, and when the coins are canonically oriented. This is highly unrealistic in practice: assumptions of limited photometric variability do not hold, and the removal of clutter (segmentation) is difficult, as is geometric registration [20]. An illustration of just some of the challenges is shown in Figure 6.
In hindsight, the disappointing performance of local-feature-based methods ought not to be surprising. Firstly, ancient coins do not possess discriminative textural information [32,33]. Textural variability is a confounding factor. Rather, appearance variation emerges from geometry (3D) of coins and, thus, the manner in which light is reflected off them. Thus, in terms of local appearance, most coins look alike-the absence of the use of their geometric relationships is crucial. Driven by this insight, the best-performing local-feature-based method builds compound features in the form of directional histograms centered at automatically detected interest points [15]. Thus, both local and distal appearances are integrated, and the geometric relationship is captured. Nevertheless, though significantly surpassing the performance of the existing method at the time, even this method failed to demonstrate practically useful matching rates. Driven in part by the lack of success of what may be termed 'conventional computer vision' approaches on the one hand and the groundbreaking achievements of deep-learning-based methods on the other, much like other recent cultural-heritage-focused computer science work [34][35][36], more recent efforts in automatic ancient coin analysis have turned their attention to the use of neural networks. Thus, Schlag and Arandjelović [18] proposed a VGG16 deep-neural-network-based algorithm for issuing authority recognition, and demonstrated outstanding performance on three large corpora of data. Aslan et al. [17] used a pre-trained ImageNet, adapted to the domain using transfer learning, on a small data set of Roman republican coins with lesser success. The deep learning algorithms of Cooper and Arandjelović [19,21] and Anwar et al. [37] both focus on the semantics of motifs depicted on coins, the former on Roman imperial and the latter on Roman republican coins-the problem which we already noted as being extremely promising in terms of practical significance, and most interesting from the technical viewpoint. Related work, not falling under the umbrella of computer vision as such nor machine learning, includes the acquisition of 3D scans of ancient coins [38][39][40]. This body of research is closer in spirit to efforts on the digitization and visualization of cultural artifacts [41,42], including temporal modeling [43][44][45] and hyperspectral imaging [46][47][48].

Curation: Motivation Thereof and Our Data
A major limitation of the published work in the realm of computer vision and machine-learning-based analysis of coins, or rather, of the evaluation methodology of this body of work, lies in the absence of an understanding of the heterogeneity of data used in it. In Section 2.2, we introduced the common standard for quantifying the degree of wear and tear suffered by a particular specimen (often referred to as the condition of the coin). Even the most inexperienced of ancient numismatists understands that, in general, the condition of a coin greatly affects the scope of analysis that it is useful for. As noted in Section 2.3, in certain instances, the legend is necessary for the precise determination of a coin's issue; yet, the legend, containing fine detail and some of the more elevated detail surfaces, often gets damaged significantly.
The aforementioned issues are likely to be even more significant when automatic computer-based analysis is used. Despite that, this problem was not recognized until the work of Fare and Arandjelović [49], who were the first to bring it to attention. In part, this is likely a consequence of the difficulty of obtaining a curated data set; hence, our present effort and contribution.
The earliest work generally used coins of a very high grade (in about Extremely Fine condition, and, notably, a small number of issues) [16,27,50]. Not only does this limit the scope of insights which the corresponding experiments provide, coins such as these are of the least interest in the context of automated analysis. Firstly, these coins are rare and comprise a very small proportion of coins handled by most numismatists. There is little gain in automating their processing. Secondly, exactly because coins in such a high state of preservation are rare, they are usually rather expensive and are sold by specialist dealers, and therefore normally accompanied with detailed information already (often including their provenance, previous owners and sales, etc.). Coins like these are normally not accidentally stumbled upon. The first work that included a more representative sample of real-world data is that of Arandjelović [15], who also used a much larger corpus (circa 3000 specimens). At the same time, because the corpus was not labeled according to the condition, it is difficult to gain much insight into the behavior of the proposed algorithm (or indeed any algorithm evaluated on the same data) and to seek an understanding of how well it performs as a function of a query (or gallery) coin's grade. The work which followed [18,19,21] also used larger and more diverse corpora, but again without any condition-based stratification.

Our Corpus of Roman Imperial Denarii
Considering the concerns and limitations that we identified and discussed related to the data sets used in the existing published literature, we carefully considered a series of issues in collecting and curating the data set introduced herein. We discuss each of these next, and conclude with a summary description of our corpus.

Why Denarii?
As summarized in Section 2.3, there were a range of different denominations used in Imperial Rome during its existence (i.e., from 27 BC, when Augustus became the first emperor, until the fall of Rome in 476 AD). Recalling our aim of collecting a data corpus curated by the condition of coins it contains, there are several reasons for why we decided to focus on denarii in particular.
Firstly, the denominations, such as the dupondii, ases, and sestertii, featured rather large and heavy coins. Sestertii, for example, usually weigh between 25 and 28 g, and have a diameter of 32-34 mm. Being of lower value, these coins were also extensively used. For these reasons, they normally suffered significant damage, which makes it difficult to source a sufficient number of examples in different states of preservation. Moreover, the aforementioned denominations were all made of more reactive materials (copper alloys) and, as a result, experience color variability due to reactions with environmental agents (moisture, acids, etc.), thus possibly introducing undesirable confounding factors to the data. Lastly, the use of these denominations declined over time, limiting the range of motifs and styles to a specific period of the Empire, and thus potentially creating a bias in the data.
On the other hand, aureii are extremely rare and difficult to find in the required range of grades-being extremely valuable (25 denarii, or about five weeks' salary of a Roman soldier), they were not used in general circulation and are usually very well preserved. More obscure denominations, such as semises and quadrantes, are also extremely rare, were used only during a short period, and exhibit the same issues related to the alloys they were made of as dupondii, ases, and sestertii, discussed earlier.
Lastly, although common, antoniniani were issued over a period of only six decades, limiting the range of issuing authorities which could be covered, as well as reverse motifs and artistic styles.
Denarii, on the other hand, do not exhibit any of the aforementioned limitations. They were made of comparatively non-reactive silver (confusingly, abbreviated to AR in numismatics, and not Ag as in chemistry) and thus experience little to no discoloration (so-called toning changes are perceptually equivalent to a simple darkening of the surface), were used extensively from the beginning of the empire until the economic crisis of the third century (i.e., for about 300 years), exhibit a wide variability in motifs depicted on them, and were common enough that a diverse set of grades is not overly difficult to find.

Curation
In Section 2.2, we summarized the common standard for grading ancient coins. It is important to note that the descriptions of different grades leave room for some subjectivity-hence, our use of the term 'quasi-objective'. In the context of the present work, the practical significance of this observation lies in the potential difficulty of ensuring accuracy. To ensure that this goal is met, it is imperative that the grading is performed by an expert and that a sufficient number of graders are used so as to average out any potential biases. Thus, all of the data in our corpus are obtained from reputable dealers and auction houses, as are the accompanying labels. An attractive feature of this approach also lies in the self-regulating control of bias-any systematic bias (e.g., overestimation of the state of preservation) would end up being self-defeating, as it would reduce the incentive to purchase the bulk of the coins which are not of the top grade.
As regards the choice of grades sought for inclusion, we focused our attention on three: Fine, Very Fine, and Extremely Fine. This choice was a simple one and was motivated firstly by the observation that coins in a condition worse than Fine are seldom of interest to either scholars or hobby collectors; the exceptions are invariably extremely rare issues. The 'collectable' range, which includes coins in Fine condition or better, is widely relevant to numismatists on the one hand and exhibits major appearance variation (even in the absence of other confounding factors), thus making it of interest to computer scientists.
Indeed, a major difficulty in collecting the present data set was discovered to emerge precisely in the breadth of conditions sought-denarii, which were expensive enough to be sold by reputable auction houses in Fine condition, are usually far too rare in Extremely Fine condition, whereas those which are readily found in Extremely Fine condition were seldom seen in Fine condition due to their low cost. Nevertheless, domain expertise helped us to direct our search, and we were successful in collecting 100 different issues, each with multiple specimens in all three grades: Fine, Very Fine, and Extremely Fine.

Data Set Description
Having discussed the reasons underlying our curation criteria and the choices ultimately made, what is left is to describe the data set we collected. We also note that this data set is available freely for research purposes upon request from the corresponding author.
Our corpus of Roman imperial denarii comprises 100 different coin issues, each with multiple examples of specimens in Fine, Very Fine, and Extremely Fine conditions. Each issue usually contains two examples (different specimens) for each condition, and some contain more, giving a total of 626 different specimens (or, equivalently, images). The issues correspond to 44 different issuing authorities and span the time period between 27 BC and 244 AD, i.e., the entire duration of the Empire until the economic crisis of the third century AD, when the consistency and quality of coinage declined severely. The full list of issues included, organized by the issuing authority in alphabetical order, is shown in Table 1.

Caveat Scolasticus
Before concluding, considering the general spirit of the present article, focused around the issues of research direction in the field of automated ancient coin analysis, methodological rigor, and repeatability, we would like to make note of and highlight one important issue that the community should take notice of in future work. Most directly, our observation concerns SIFT features [51], which have been used extensively in past research [14,15,27,52]. Although we note that these methods have failed to demonstrate much success, it is possible that they will find their use within differently constituted frameworks in the future. Moreover, the general point has much wider applicability.
In particular, we found that different implementations of what nominally appears to be the same deterministic algorithm of a well-known and widely used technique actually differ greatly in the output they produce. For example, we found that on the data corpus introduced in this section, the OpenCV implementation of SIFT results in approximately 1250 detections of local features per image, in contrast to the VLFeat one, which produces approximately 415. This is a huge difference (threefold, or half an order of magnitude) which undoubtedly affects any subsequent processing, representation, and learning. Thus, first and foremost when reporting their findings but also when designing new algorithms, researchers should make sure that they explicitly note the exact implementation of any technique used, no matter how standard, as well as the values of all relevant parameters.

Summary
In this paper, we addressed a number of important issues in the increasingly active research domain of application of computer vision and machine-learning-based analysis of ancient coins, which has received insufficient attention to date.
Our first contribution comes in the form of the first overview and discussion of different challenges in the field. Its aim is to clarify the intellectual landscape of automated ancient coin analysis and help direct future research efforts. Indeed, many of the discussed challenges have been scarcely investigated to date, while others have hitherto been unrecognized and remain entirely unaddressed.
Our second contribution concerns the increasingly obvious lack of standardized and appropriately curated data sets, crucial for facilitating methodological evaluation of algorithms and, in particular, the effects that coin preservation has on the performance of different methods. Indeed, until now, only one published work at all recognized the need for this kind of analysis, which, to any numismatist, would be a trivially obvious fact. Hence, we introduce the first data set to be carefully curated and collected for this purpose. We also discuss a wide range of considerations which had to be taken into account in collecting this corpus, explain our decisions, and describe its content in detail. In summary, the data set comprises 100 different coin issues, all with multiple examples in Fine, Very Fine, and Extremely Fine conditions, giving a total of over 650 different specimens (and thus images). These correspond to and depict 44 issuing authorities and span the time period of approximately 300 years, namely from 27 BC until 244 AD.
Lastly, the present article includes the first recognition of the variability in the functioning of different implementations of seemingly identical and standard 'off-the-shelf' techniques. The lack of appreciation of this fact and the lack of reporting of the details of the exact implementations used in experimental evaluations raise serious concerns as regards our understanding of the performance of different algorithms, limit insights that can be derived from the published findings, and bring forth methodological issues faced by the community. Hence, we argue for the need for greater awareness of such considerations and advise researchers to report the relevant details in the future without omission.
Our hope is that the present article should be an invaluable resource to researchers in the field, and we encourage the community to adopt the collected corpus, freely available for research purposes, as a standard evaluation benchmark.