Nanopublication-Based Semantic Publishing and Reviewing: A Field Study with Formalization Papers

With the rapidly increasing amount of scientific literature,it is getting continuously more difficult for researchers in different disciplines to be updated with the recent findings in their field of study.Processing scientific articles in an automated fashion has been proposed as a solution to this problem,but the accuracy of such processing remains very poor for extraction tasks beyond the basic ones.Few approaches have tried to change how we publish scientific results in the first place,by making articles machine-interpretable by expressing them with formal semantics from the start.In the work presented here,we set out to demonstrate that we can formally publish high-level scientific claims in formal logic,and publish the results in a special issue of an existing journal.We use the concept and technology of nanopublications for this endeavor,and represent not just the submissions and final papers in this RDF-based format,but also the whole process in between,including reviews,responses,and decisions.We do this by performing a field study with what we call formalization papers,which contribute a novel formalization of a previously published claim.We received 15 submissions from 18 authors,who then went through the whole publication process leading to the publication of their contributions in the special issue.Our evaluation shows the technical and practical feasibility of our approach.The participating authors mostly showed high levels of interest and confidence,and mostly experienced the process as not very difficult,despite the technical nature of the current user interfaces.We believe that these results indicate that it is possible to publish scientific results from different fields with machine-interpretable semantics from the start,which in turn opens countless possibilities to radically improve in the future the effectiveness and efficiency of the scientific endeavor as a whole.


INTRODUCTION
Considering the abundance of scientific articles that are published every day, keeping up with the latest research is becoming a significant challenge for researchers in many fields. This is at least partially due to the fact that we are still holding on to an archaic paradigm of scientific publishing: the canonical way to publish scientific results is by writing them up in long English texts called articles, which are in the best case easy to read by human experts but remain mostly inaccessible to automated approaches (except on a very superficial level with text mining approaches). These articles then undergo peer reviewing, which is typically done in a way that is secretive and not standardized, with the effect that the reviewing process may lack transparency and the valuable comments from the reviewers cannot be reused or build upon. There have been studies on the effectiveness of peer-reviewing in its current form (Smith, 1988;Linkov et al., 2006;Kotturi et al., 2017) that showed not only systematic biases among peer-reviewers, but also a lack of transparency in the general peer-reviewing process as a whole (Smith, 2010;Benda and Engels, 2011;Lee et al., 2012). Making reviews open might alleviate some of these concerns by ensuring higher-quality reviews, while at the same time increasing the trust in the reviewing process and the quality of the scientific publications themselves.
A range of approaches have been proposed to address some of these problems by making scientific texts machine-readable, allowing for automatic summarising, finding and retrieving information easier and even the ability to (partially) reason on the scientific texts themselves. Text mining approaches work reasonably well when it comes to simple entity extraction with techniques like named-entity recognition to extract the main concepts from a text (e.g. (Al-Moslmi et al., 2020;Yadav and Bethard, 2018)), but accuracy dramatically drops with more complicated tasks like relation extraction or identifying links between entities (Etzioni et al., 2005;Xu et al., 2015;Zeng et al., 2014).
The vast majority of existing approaches of making scientific texts machine-readable have one thing in common: they take the current paradigm of scientific articles for granted and therefore take them as their starting point to extract information. While it is important to try to process the vast amount of existing scientific literature that has the form of long English texts (and sometimes long texts in other languages), we should also think about how we can improve the way how we publish scientific insights in the first place. An important aspect of this is the vision of semantic publishing, which we mean here in the sense of genuine semantic publishing (Kuhn and Dumontier, 2017), where the machine-interpretable formal semantics cover the main scientific claims the work is making. Nanopublications (Groth et al., 2010), which are small RDF-based semantic packages, have emerged as a powerful concept and technology for enabling such genuine semantic publishing.
In previous research we have applied nanopublications to implement a semantic and fine-grained model for reviewing (Bucur et al., 2019), and have extended this to semantically represent the full structure of (classical) scientific articles with their reviews and review responses as a single network of nanopublications (Bucur et al., 2020). In order to get closer to our vision of genuine semantic publishing, however, we need to represent not just the structure but also the main content of these articles, most importantly their main scientific claims. To that aim, we proposed in subsequent work the super-pattern, a semantic template to represent the meaning of scientific claims in formal logic (Bucur et al., 2021).
Taking an example from our previous study as illustration of the super-pattern, it has been stated in the scientific literature (Felix and Barrand, 2002) that in particular kinds of cells in the rat brain (specifically, endothelial cells) some sort of stress called transient oxidative stress affects the expression of a protein called Pgp. The super-pattern consists of five slots that would in this example be filled in as follows: • Context class: rat brain endothelial cell • Subject class: transient oxidative stress • Qualifier: generally • Relation: affects • Object class: Pgp expression Informally, we can read this in the following way: whenever there is an instance of transient oxidative stress in the context of an instance of a rat brain endothelial cell, then generally (meaning in at least 90% of the cases), that instance of stress has the relation of affecting an instance of Pgp expression. Formally, it directly maps to this logic formula: This is stating in logic terms (in slightly non-standard notation using conditional probability as a shorthand) that given a thing y of type transient-oxidative-stress in the context of a thing x of type rat-brainendothelial-cell, the probability of there being a z of type pgp-expression that is in the same context x is at least 90%. We have shown that this pattern can be applied to formalize most high-level claims found in scientific literature across disciplines (Bucur et al., 2021).
In the work to be presented below, we combine all these elements of our previous work -namely semantic representation of reviews, scientific works as a networks of nanopublications, and representing the main claims with the super-pattern -in order to implement genuine semantic publishing and putting it to the test in a field study. For practical reasons, we did not require the scientific claims in this field study to be novel ones, but they were selected from existing publications. This field study led to the publication of a special issue in an established journal (Data Science) at an established publisher (IOS Press). This special issue consists of what we call formalization papers, which are nanopublication-based semantic publications whose novelty lies in the formalization of a previously published scientific claim.

2/18
In this research we therefore aim to answer the following research question: • Are nanopublications and the super-pattern appropriate concepts to enable a new paradigm of scientific communication where authors publish their scientific findings with formal semantics?
The rest of this article is structured as follows. In Section 2 we describe the current state of the art in the field of scientific publishing with regard to scientific knowledge representation, semantic publishing and semantic articles and also alternative proposed machine-readable approaches like nanopublications. In Section 3 we describe our approach with regard to a new way of publishing, starting from a formal way of representing the content of scientific claims and ending with the representation of the publication process itself in what we call "formalization papers". We then report and discuss the results of the field study we performed using formalization papers in Section 4. Future work and conclusion of the present research are outlined in Section 5.

BACKGROUND
We provide here the background on scientific knowledge representation, scientific publishing, and nanopublications in particular.

Scientific knowledge representation
Novel proposals for the current "Disruption Era" (Rahardja et al., 2019) include scientific publication management models that connect abstract knowledge with actual world problems in the constantly growing body of scientific knowledge (Chi et al., 2018), and the use of decentralized publication systems for open science using, for example, existing technologies like Blockchain and IPFS (Tenorio-Fornés et al., 2019).
With respect to the growth of scientific literature, there is a trend towards multidisciplinary and interdisciplinary research, together with an increased volume of publications every year (Wong, 2019). A range of methods have been proposed to make scientific articles more machine-readable: from structuring scientific works as Research Objects (RO) (Bechhofer et al., 2013;Belhajjame et al., 2015) to using facets in order to uncover the main methods, data, code and other objects that are used in scientific articles (Peroni et al., 2013;McGregor, 2008). Most approaches, however, have focused on automated content extraction from scientific articles as they are currently available. Recent machine learning techniques, for example, can after training with large sets of scientific articles extract the main concepts and structure of scientific articles (Xu et al., 2015;Zeng et al., 2014). While the results can be very valuable there are also clear limitations, with the resulting data needing almost always manual curation to achieve decent quality (Garcia-Castro et al., 2013;Coulet et al., 2011;Sernadela et al., 2015). A significant number of vocabularies and ontologies in many various domains have been developed, which are now ready to be used for scientific knowledge representation. However, they remain often difficult to find, access and understand due to the lack of documentation, versioning problems, and unresolvable URIs, among other things (Garijo and Poveda-Villalón, 2020;Halpin et al., 2010;Hitzler and van Harmelen, 2010;Jain et al., 2010). A considerable amount of attention has also been given to the datasets accompanying scientific articles. The Data Set Knowledge Graph (DSKG), for example, covers datasets from over 600k scientific publications (Färber and Lamprecht, 2021). An important development in this respect is the strong momentum behind the FAIR initiative to make research data Findable, Accessible, Interoperable, and Reusable (Wilkinson et al., 2016). A large amount of research is ongoing on how these FAIR principles can be put into practices (e.g. (Garijo and Poveda-Villalón, 2020)). Many other aspects of scientific communication have been approached with more formal representations, such as declaring authorship contributions with the Contributor Roles Taxonomy (McNutt et al., 2018) to mention just one of them.
Semantic technologies have been used extensively in the Life Sciences, e.g. for the representation and discovery of concepts, their relationships and associated supporting evidence in order to integrate distributed repositories (Hannestad et al., 2021). A variety of controlled vocabularies exist in these fields that can serve as the foundation to represent scientific knowledge in a structured way in order to semantically capture the context of scientific findings (Chibucos et al., 2014;Slater and Song, 2012;Madan et al., 2019). The BEL language (Slater, 2014) is one of the few attempts to represent the high-level scientific claims themselves, with coverage for specific kinds of biological relations. Also many other domains besides the Life Sciences have adopted the principles and technologies of Linked Data and the 3/18 Semantic Web, for example to build interlinked, heterogeneous, and semantically rich datasets in Cultural Heritage (Hyvönen, 2012) and to find, address, and sometimes even solve research problems in Digital Humanities in interactive ways (Hyvönen, 2020).

Semantic publishing and semantic papers
Semantic publishing applies semantic technology to scientific publishing, and comes in many forms and does not always align with what we have introduced above as genuine semantic publishing (Kuhn and Dumontier, 2017). Under this umbrella of semantic publishing, there are approaches that generate semantically-enriched data models from digital publications for the integration, sharing, management and data comparison between publications (Perez-Arriaga, 2018), study the semantic annotation and enhancement of scholarly articles (Shotton, 2009), provide dynamic visualizations in semantically enhanced papers (Senderov and Penev, 2016), assess the versioning aspect of semantic publishing (Papakonstantinou et al., 2018), create a global-scale platform with a dataset metadata for automated ingestion, discover, and linkage (Jacob et al., 2017), and propose semantic and web-friendly HTML-based alternatives to the currently PDF-focussed scientific writing process (Peroni et al., 2016). Semantic enhancements of scientific articles can be used for semantic interlinking, interactive figures, re-orderable references and even summary creation , and workflows to convert regular scientific articles into Linked Open Data have also been investigated (Sateli and Witte, 2016). Other approaches like the compositional and iterative semantic enhancement (CISE) advocate for a process of automatic semantic enhancement with semantic annotations (Peroni, 2017). A key role in most of these approaches is played by the variety of existing ontologies covering many different aspects of scientific publishing, most importantly the Semantic Publishing and Referencing (SPAR) Ontologies (Peroni, 2014).
Further note-worthy approaches include the work to semantically represent the setup and results of scientific studies, which then allows for running meta-analyses in a semi-automated way, better research replication, and automated hypothesis generation (Tiddi et al., 2020), and the development of the Open Research Knowledge Graph (Jaradeh et al., 2019). The latter is an initiative that aims to make research articles machine-readable by expressing their main scientific entities as a semantically interconnected knowledge graph. This graph is populated by methods such as extracting scientific concepts from the abstracts of scientific articles with the help of annotators (Brack et al., 2020).

Nanopublications
Nanopublications (Groth et al., 2010) are a specific concept and technology that deserves special attention here. They have been proposed to express scientific (and other kind of) knowledge in Linked Data as small independent publication packages. They allow for rich provenance and metadata and are structured as follows: the assertion part contains the main content of the nanopublication, such as a scientific claim, expressed as RDF triples. The provenance part of a nanopublication describes how the assertion came about, e.g. by linking to the scientific methods used to arrive at the finding. The publication information part, finally, contains metadata about the nanopublication as a whole, such as by whom and when it was created. Nanopublications can be used for scientific findings, but also for representing the other elements of the scientific workflow, such as reviews and method descriptions, and more generally any kind of small coherent set of RDF triples (Kuhn et al., 2013). It has been shown how nanopublications can be made reliable and immutable by identifying them with cryptographic Trusty URIs (Kuhn and Dumontier, 2015), and how this allows for a decentralized network of services and template-based user interfaces such as Nanobench .

METHODOLOGY
In this section we describe the approach and methods we followed to investigate whether nanopublications and the super-pattern are suitable to achieve genuine semantic publishing.

Approach
In our approach, we committed to a number of features. First, we wanted the final contributions to be published as "real" papers in a real established journal. They should be fully semantically represented (in RDF) but also have classical views that makes them look like other papers. Like that, they should also seamlessly integrate with the existing bibliometric system and it should be straightforward to cite them in the classical way. Second, we decided to fully focus on arguably the most interesting element of scientific articles, which happens to also be one of the most challenging to formally represent: the main scientific claims the article is making. Scientific articles have a large number of other interesting pieces of information, e.g. information about the used methods among many other things, but for the purpose of the study to be presented, we focus only on the main claims.
Third, in order to retain the flexibility and power of nanopublications, we decided to refrain from providing a custom-built and optimized user interface that hides the complexity and limits the flexibility. By using generic template-based nanopublication tools and by customizing them solely by providing the templates, we hoped to get a better understanding of how the nanopublication technology works for such kinds of content and workflows in general, and not just for our specific case. On the other hand, this also means that we were looking for a bit more technically minded authors who can handle interfaces that do not come with all the comfort of polished specific applications.
Fourth, we wanted to test a system that could be used to publish novel claims, but decided for practical reasons to focus on formalizing claims from previously published articles. Our approach is therefore based on what we call formalization papers that contribute novel formalizations of existing claims.
Finally, we wanted to cover not just these main claims, but the whole publishing workflow that involves the initial submission of contributions, their reviewing, the responses to the reviews, the updated versions, and the final decision, and represent these as independent but interlinked nanopublications.

Formalization Papers
Our approach builds upon our new concept of formalization papers. A formalization paper contributes a semantic formalization of one of the main claims of an already published scientific article. Its novelty therefore lies solely in the formalization of a claim, not the claim itself. The authors of such formalization papers consequently take credit for the way how the formalization is done, but not for the original claim (unless that claim happens to come from the same authors).
The content of a formalization paper is fully expressed in RDF in the form of nanopublications. Such a formalization paper can be shown in other formats to users, e.g. in HTML or PDF, but these are just views of the same underlying RDF content. Our formalization papers consist of nanopublications in which the assertion contains the formalization of the scientific claim using the super-pattern (Bucur et al., 2021), the provenance points to the original paper of the claim, and the publication information attributes the author of the formalization. Figure 1 shows an example of such a nanopublication in the interface the participants of our study used to create them. The instantiated super-pattern in the assertion part refers to a context class, a subject class, a qualifier, a relation type, and an object class according to the super-pattern ontology 1 . In the process of coming up with such a formalization, one often realizes that for some of the class slots of the super-pattern (i.e. context, subject, and object class) the class that should be filled in to arrive at a correct formalization is not directly defined in any existing vocabulary or ontology and as such, this class might need to be minted as well. The provenance part of the nanopublication describes the "formalization activity" that was conducted in order arrive at this formalization from what is written in the source publication. The precise phrase from that source publication that was used can be quoted too.

Tools
In order to publish formalization papers, class definitions, and all the other kinds of nanopublications (submissions, reviews, responses to reviews, and decisions), we use Nanobench (Kuhn et al., 2021) 2 . Figure 1 introduced above shows a screenshot of the publishing page of Nanobench. Publishing in Nanobench is based on templates, which are themselves expressed in nanopublications. The form shown in the screenshot is automatically generated based on the information found in several template nanopublications that we created and published for that purpose. All the application-specific behavior is therefore semantically represented in the templates, and Nanobench can flexibly be used for any other kind of data and workflow.
The second tool that we are using, Tapas (Lisena et al., 2019) 3 , is equally generic. It is a simple user interface component built on top of grlc (Meroño-Peñuela and Hoekstra, 2016) that allows to run template-based SPARQL queries on RDF triple stores. In our case, we run it on SPARQL endpoints provided by the nanopublication service network . We use Tapas to show aggregations and overviews of submissions and reviews. Figure 2 shows a screenshot of the main submission overview.  Tapas by itself is read-only, but we connect to the Nanobench tool with links that lead to partially filled-in forms (e.g. "click here to add review" in the screenshot).

Field study design
In order to test our approach, we devised a field study where interested authors could submit formalization papers, which upon acceptance are to be published as a special issue in the journal Data Science 4 by IOS Press. The goal of this was to demonstrate for the first time that scientific articles can be formalized and therefore machine-interpretable including the main scientific claims. As a secondary goal, we wanted to find out whether nanopublications are a good technology for that, and whether it is feasible to represent also the entire submission and reviewing process within the same framework.
Because the user interfaces we have at our disposal are still quite rough and technical, we restricted the set of possible authors and sent the call for papers on a by-invitation basis to selected groups of researchers who have previously worked or had experience with technologies like RDF and semantics. We expect to be able to build more accessible user interfaces in the future that can show the inherent complexity in a way that does not require technical skills, but how this can be achieved is out of scope for this work.
Participants to our field study, thus the authors of formalization papers, formalized their own previously published claim, or a claim from a paper published by others. In the latter case, the formalization paper authors take credit for the formalization of the claim but not for the claim itself. All submissions to this special issue were peer-reviewed (also as nanopublications) using our previously proposed reviewing ontology (Bucur et al., 2019). Upon acceptance, these formalization papers are to be published in a journal at IOS Press, thereby giving them the same bibliometric status as other scientific articles, which leads to regular indexing in scientific article databases, counting of citations, and so on.
The whole timeline of the field study that encompasses the special issue can be seen in Figure 3. The authors received close guidance on how to represent a claim of their choosing in RDF using the super-pattern and nanopublications, and on the various stages of the publication process. Authors took part in several information sessions and discussion meetings and were provided at each step with helper materials, videos, and even direct assistance if needed. In total, 24 such individual sessions were organized from May to December 2021.
In order to define a formalization, sometimes some of the class slots (i.e. context, subject, and object slots) of the super-pattern should be filled in with classes that are not yet defined in any existing vocabulary or ontology. In this case the authors first had to define these themselves, and they could do that also with the Nanobench tool loading a template for class definition. (Alternatively, they could also mint a new class identifier by other means, such as creating it on Wikidata.) The assertion of a nanopublication defining a new class may look for example as follows (link to full nanopublication): sub:STX1B-mutation a owl:Class ; rdfs:subClassOf wd:Q42918 ; rdfs:label "STX1B mutation" ; skos:definition "mutation in STX1B" ; skos:relatedMatch wd:Q18048867 .
Here, "mutation" from Wikidata (Q42918) is declared as super-class of the newly minted class "STX1B mutation", and "STX1B" (Q18048867) is linked as a related class.
Then the authors can publish their formalization in the form of a nanopublication using Nanobench (see Figure 1), and afterwards they needed to submit it to the special issue using another Nanobench template, leading to an assertion like (link to full nanopublication): <http://purl.org/np/RAGo62Hb_Bx1klF4pn1q1Ty40860e3A7Sz4hr2vojZ2wA> pso:withStatus pso:submitted ; frbr:partOf fpsi:DataScienceSpecialIssue .
All submitted formalizations were subsequently reviewed. All authors were encouraged to review other submissions, and these reviews were semantic, open, and non-anonymous. These reviews were again done in nanopublications with the Nanobench tool. Such an example of a nanopublication assertion that contains a review modeled using the reviewing ontology can be seen below (link to full nanopublication): sub:comment a lfr:ReviewComment , lfr:ContentComment , lfr:NeutralComment , lfr:SuggestionComment ; lfr:hasCommentText "Maybe the use of a causal relation like \"contributes to\" can also be used here." ; lfr:hasImpact "1" ; lfr:refersTo <http://purl.org/np/RAGo62Hb_Bx1klF4pn1q1Ty40860e3A7Sz4hr2vojZ2wA> ; lfr:refersToMentioningOf sp:hasRelation .
In such a structured review (see more details in our previous research (Bucur et al., 2020)), it is possible to specify various aspects that the review addresses including the aspect it comments on (syntax, style or content), the positivity/negativity of the review, the impact and the action that needs to be taken by the authors as the reviews see it (whether it is compulsory to be addressed, a suggestion or no action needs to be taken by the authors) and the importance of the point made by the review for the overall quality of the formalization. In the above example, the review comment makes a neutral point about the content of the given formalization with an importance of 1 out of 5, and is marked as a suggestion for the authors. The specific part of the formalization that this review targets is the sp:hasRelation field, as indicated by the refersToMentioningOf relation.
Subsequently, authors of the submissions could respond to the received review comments, again in nanopublications, and update their submissions based on these review comments. This is an example of a response to a review comment (link to full nanopublication): sub:comment a , lfr:ResponseComment lfr:DisagreementComment , lfr:PointNotAddressedComment ; lfr:hasCommentText "I don't think the original publication shows a causal relationship. It seems to me only a correlation is proven." ; lfr:isResponseTo <http://purl.org/np/RAio--7IbPa3_ZSG3GspUsXeWP2ZwMIzy4Kzos0yZ7NIw> ; lfr:refersTo <http://purl.org/np/RAeRSya2qIYymsBxiqOZP_oaQpHXUVXiydKvPCFM-7DDQ> .
This response registers the agreement with the point made by the reviewer (whether the author agrees totally, partially or not at all) and if that point was addressed, partially addressed or not addressed at all by the author. Moreover, a link to the respective review is given using the isResponseTo relation, while the updated version of the formalization is indicated using the refersTo relation. In our example, we see that the author does not agree with the point made by the reviewer and hence did not address the point raised by him, and also give a textual motivation on why this is the case.
Finally, the authors updated their formalizations with the same template as depicted in Figure 1. The full final formalization nanopublication of the same example is shown in Figure 4. For all updated submissions then a decision was made by us as the special issue editors about their acceptance. This decision was also represented as a nanopublication that looked as follows (link to full nanopublication):
<http://purl.org/np/RAeRSya2qIYymsBxiqOZP_oaQpHXUVXiydKvPCFM-7DDQ> dct:description "All review comments were addressed and the formalization looks good." ; pso:withStatus pso:accepted-for-publication ; frbr:partOf fpsi:DataScienceSpecialIssue . All formalizations reached a satisfactory level of quality, as indicated by the reviews and the authors' responses, and we therefore accepted all 15 submissions for publication.
In order to show the accepted papers in the special issue as if they were classical papers, to integrate them in the publisher's content management system, and to make them connect to the existing bibliometric system, we semi-automatically created "classical views" in the form of HTML and PDF versions of the nanopublications, as can be seen in Figure 5.

User Feedback
In order to evaluate the general idea of formalization papers, all participants to the field study were asked to give us their opinion and report on their experiences about the involved processes and concepts. This Figure 5. The "classical view" of a formalization paper, as it will appear on the publisher's website.

9/18
evaluation was performed by means of a structured questionnaire consisting of four main parts, each one evaluating different aspects of the workflow.
In the first part, we are interested in assessing the difficulty of conceptually understanding the formalization paper idea and the super-pattern, and of performing the formalization tasks. In part two, we focus on the difficulty of the technical aspects in the various submission, reviewing and revision stages. Part three addresses some more general aspects about the authors' experience and preferences.
Authors were asked about their confidence in the formalization they published and about their interest of publishing such formalizations along their scientific publications in the future. We also asked them how important they think it is that all these steps are performed by the authors themselves (as they did). Moreover, in this part, authors could give us their opinion with regard to the importance of having a "classical view" along with the nanopublication representation of their formalization paper. The fourth and final part of the questionnaire asked for the technical background of the authors. At the very end, the respondents could give further free-text feedback. The full questionnaire is available in our supplemental material 5 .

RESULTS
In this section we present the formalizations that resulted from our field study. We present a descriptive analysis of the generated data and analyze it also with the help of a network visualization. Finally, we report on the results from the user feedback questionnaire.
All the nanopublications that were created for all the submissions, formalization paper versions, the review comments, the responses to the review comments and the newly minted classes used in the formalizations together with the decisions are accessible online 6 , while the nanopublication index containing all these nanopublications has also been published 7 . Also, the final submissions for the special issue with formalization papers at the Data Science journal 8 can be found online 9 .

Analysis of Formalizations
In total, we had an initial number of 20 people that replied to our call for papers 10 from 12 different institutions from the United States of America, Germany, Luxembourg, Bulgaria, and The Netherlands from fields like biomedicine, bioinformatics, health sciences, ecology, data science, and computer science. After an initial information session, out of the 20 authors that responded to the call for papers, 18 decided to continue their participation. All these 18 authors that responded to the call for formalization papers managed in the end to publish (upon acceptance) their articles in a special issue at the Data Science journal.
We had a total of 15 formalization paper submissions, 13 with individual authors and 2 with joint authorship. Out of the total of 18 authors, two of these have both an individual submission and a jointauthorship one. The super-pattern instantiations of the final accepted formalization paper submissions can be seen in Table 1. Here, the classes used to instantiate the super-patterns that comprise the formalizations are given for each submission: the context, subject and object classes for each submission are listed, together with the qualifier and relations selected from the SuperPattern ontology (Bucur et al., 2021). Each instantiation of the super-pattern can be interpreted as follows: "Every thing of type [SUBJECT] that is in the context of a thing of type [CONTEXT] [QUALIFIER] has a relation of type [RELATION] to a thing of type [OBJECT] that is in the same context.".
Looking at Table 1, we see that the super-pattern instances exhibit quite a broad variety of scientific fields (bioinformatics, biomedicine, pharmacology, data science, computer science) mostly linked to the life sciences. 7 out of the 15 submissions contain a formalization in which authors extracted a scientific claim from their own previously published article (submission number marked with ). Additionally, out of the total 44 classes used in the formalizations, 22 new classes were minted using Nanobench (marked with *), while 4 were newly minted Wikidata classes (marked with **). 13 already-existing classes were 5 https://github.com/LaraHack/formalization papers supplemental/tree/main/questionnaire 6 https://github.com/LaraHack/fpsi analytics/tree/main/nanopubs 7 Nanopublication index: http://purl.org/np/RAkLJW7vIsnKKJDf1iswdgtFPQSo3lEG z8DhHfD7dofE 8 The link will be added in late March 2022, when the special issue is due for publication. 9 https://github.com/LaraHack/formalization papers supplemental/tree/main/accepted submissions 10 https://github.com/LaraHack/formalization papers supplemental/tree/main/call for papers 10/18  Table 1. Instantiated super-patterns accepted for publication in formalization papers in the Data Science special issue. Submissions marked with are formalizations in which authors extracted a scientific claim from their own previously published article; classes minted using Nanobench are marked with *, while newly minted Wikidata classes are marked with **.
reused from Wikidata (their Wikidata identifier is specified next to the class name) and 4 classes were referenced from other ontologies.

Analysis of Nanopublications
In Table 2 shows the statistics about the nanopublications created during our field study 11 . It shows a total of 15 submissions with their 15 corresponding super-pattern definitions. There are 25 updated super-patterns, indicating that some of the submissions were updated more than once. 34 new classes were minted in nanopublications as class definitions, which were subsequently used in the formalizations. With regard to the reviews received and the author responses, class definitions received an average number of around 3 reviews per class, while the super-pattern definitions had almost 8 review comments on average. In terms of the responses given to these reviews, the average responses to class definitions was a little over 2, while the average number of responses to the review comments for the super-pattern definitions was about 6.7.
In Figure 6 we can see this graphical representation of all the special issue nanopublications, where each node represents such a nanopublication and the arrows between the nodes show how the nanopublications are linked semantically with each other. The legend for the node types indicated by color and letter code can be found in Table 2.
For every formalization paper, we see a first formalization (F) together with a submission nanopublication (S). Later updated versions (U) of formalizations also link back to the initial formalization. The initial submissions received review comments (R), to which authors then answered with response nanopublications (A). Additionally, some of the formalization papers used newly minted classes (C), which then also received review comments and responses. The final decision (D) points to the finally accepted updated formalization. The edges (i.e. arrows) of the graph indicate when a nanopublication is referring to another one by using its identifier in the assertion. The edges shown in red are superseding relations, pointing from a new version of a nanopublication to its previous version. This is how nanopublications, being immutable, are dealing with representing new versions.

User Feedback Analysis
The 18 authors and co-authors of the formalization papers were asked to fill in the user feedback questionnaire. It was important for this questionnaire to be fully and reliably anonymous, as the authors needed to be able to give their honest opinions. This meant that we had to send reminders without knowing who already filled it in. After several rounds of reminders, we ended up getting 19 responses, meaning that at least one of the authors submitted two responses. Due to the anonymous nature of this questionnaire, it was not possible find out which responses were affected, and we have therefore to deal with such a dataset of slightly imperfect representation. In Figure 7 we see the results for the first part of the questionnaire. Authors expressed that it was rather easy to understand what a formalization paper is (with a score of 4.32 out of 5). The elements of the super-pattern were found a bit harder to understand but still quite easy, with scores above 3.50. Finding an article from which to select a claim to formalize and to understand what the chosen claim really meant was also deemed easy, with scores of 3.74. The actual instantiation of the super-pattern with all its fields given the chosen claim was considered a little more difficult, with scores around 3.0, indicating medium difficulty roughly in the middle of very difficult and very easy. These results seem to suggest that the authors were able to understand the main formalization papers idea together with the super-pattern that comprises it, but when it came to the actual instantiation of the super-pattern (especially concerning the context and subject class), this was considered a little more difficult, but still on average far from very difficult.
In Figure 8, we see the authors' responses with respect to technical difficulty. In terms of the tools used, we see that setting up and using Nanobench was considered easy enough (with a score of 3.30), while the Tapas interface seems a little harder to use (with a score of 2.76). The different tasks in the different stages all seemed to be between medium and easy on average, with the exception of the tasks to provide responses to reviews, which scored slightly below 3.0. The response nanopublications are indeed among the most complex ones, as they refer not only to the affected review but also to the updated formalization. Overall, while these results show room for improvement, they still seem favorable given that we were building upon generic and powerful tools without specific user interface design or polishing. Figure 9 summarizes the assessment of more general aspects of formalization papers and also contains information about the authors' background. We see that authors have a high confidence in the quality of their formalization, with an average score of 4.0, and that they are interested in the future publication of such a formalization along their scientific publications, with a score of 4.05. The respondents very clearly stated that the classical view of formalization papers is important for website visitors, with a score of 4.68. Exposing also the "naked" nanopublications to the website visitors with a nanopublication view was found to be much less important (3.32).
The authors indicated that they have, on average, a high level of knowledge on the topics of knowledge representation, knowledge graphs, Linked Data, and ontologies/vocabularies, with scores from 3.89 to 4.00. Their background in nanopublications, formal logic, and programming languages was significantly lower, on average, but still relatively high, with scores between 3.05 and 3.47.
10 out of the 18 authors used the free text feedback of the questionnaire. 8 of these 10 respondents

12/18
expressed their excitement about the field study and found the formalization paper concept and the whole publication process interesting and useful. However, half of these respondents also mentioned that the overall process proved to be a little more difficult than they expected, due to the tools used maybe being too technical. One author also pointed out that multiple formalizations can be written for the same claim by choosing the context, subject and object classes differently and expressed the worry that this would decrease the interoperatibility or utility of formalizations especially when aggregating or mining them. This is a reasonable point to make, but due to the fully formal semantics, syntactic differences are in principle not hindering this kind of interoperability. Overall, the super-pattern, the formalization paper concept, and the nanopublication-based publication workflow seem to have been well-accepted and understood by the participants, and many of them showed an enthusiastic reaction.

DISCUSSION AND CONCLUSION
The publication of the special issue with formalization papers at the Data Science journal shows not only that nanopublications and the super-pattern can be used to implement the basic steps and entities of a journal workflow, but also that authors of such formalization papers can be taught to use these in order to publish in a novel journal publication workflow as the publication of the special issue demonstrates. Our results show that the super-pattern can be well understood conceptually and despite the fact that from a practical standpoint applying it seems to be more difficult, its application remains perfectly feasible. Furthermore, we saw in our field study that even if the current general-purpose tools can be considered a viable solution, these are not necessarily easy to use, but they still remain a good tool for the purpose of  publishing formalization papers. Moreover, considering the formalization papers, authors seem confident with regard to the quality of their publications and seem interested in publishing such formalizations in the future.
In future work, we plan to take the next logical step by publishing novel claims in this way form the start, and not depend on claims from already-published papers. These contributions will then also have to be accompanied by statements about the methods, equipment, and all other relevant scientific concepts, and can include not just the high-level claim but more lower-level ones, possibly all the way down to the raw data. This representation would then ideally cover the entire scientific workflow, starting from a motivation, leading to the design and execution of a study, and ending in new scientific insights. Such fully formalized scientific contributions can be seen as a major step -even a breakthrough -for the Semantic Web and Open Science movements and will bring us closer to a world where machines can interpret scientific knowledge and help us organize and understand it in a reliable and transparent manner.