Making science computable: Developing code systems for statistics, study design, and risk of bias

Graphical abstract


Introduction
Crisis leads to innovations. The COVID-19 crisis stimulated collaborative efforts resulting in a breakthrough in the communication of evidence in scientific literature. Today the evidence is not reported in a form that computers can understand. Evidence is not yet expressed in precise, unambiguous format (i.e., computable formats). The nearinfinite variations in how evidence can be expressed using natural language means that it requires substantial expertise and contextual awareness for people to determine if the evidence matters, to interpret what the evidence means, and to determine the certainty of these interpretations. To make scientific evidence shareable, interoperable, and computable, it is essential to use standardized concepts from controlled terminologies and vocabularies. This article introduces early efforts to develop an infrastructure for electronic data exchange for the identification, processing, and reporting of scientific findings, and presents a 13-step Code System Development Protocol created to support global development of terminologies for exchange of scientific evidence

Introduction to Fast Healthcare Interoperability Resources
Fast Healthcare Interoperability Resources (FHIR®) is rapidly overcoming the seemingly intractable interoperability problem in the sharing and exchange of health information [1]. FHIR solves the interoperability problems by breaking down key units of data exchange into resources. Each FHIR resource instance describes a distinct identifiable entity, and each FHIR resource has a FHIR StructureDefinition Resource instance that describes the set of data element definitions and their rules of use that define the FHIR specification itself. Rather than forcing all health-related knowledge to fit one organizational pattern for a common structural model, FHIR enables resource-specific structure definitions to enable the most efficient and flexible approach. Health Level 7 International (HL7®), the standards developing organization that created and maintains FHIR, addresses the human problem in universal agreement to a technical standard by supporting open, transparent, logical processes and systems for people from all perspectives to participate [2].

Extension of FHIR to evidence-based medicine
There is currently no widely implemented standard that overcomes the seemingly intractable interoperability problem of sharing and exchange of computable representations of scientific knowledge. Facing such challenges with the communication of scientific knowledge to inform healthcare decision making, communities within and across researchers, systematic reviewers, guideline developers, and healthcare professionals have advanced human-interpretable expectations for trustworthy interpretation and application of scientific knowledge [3]. This area is often labeled evidence-based medicine (EBM), evidencebased practice, or evidence-based healthcare [4,5].
HL7 approved a project in 2018 to develop FHIR Resources for Evidence-Based Medicine Knowledge Assets (EBMonFHIR) [6]. In the following 18 months via weekly web meetings and five Connectathons, the EBMonFHIR project created FHIR StructureDefinition Resources for Evidence, EvidenceVariable, Statistic, and OrderedDistribution FHIR resources.
• The EvidenceVariable Resource is used to describe a variable used in statistical expressions, with one or more of defining characteristics expressed using standardized concept codes (i.e., codable concepts [7]). • The Statistic Resource supports the expression of a statistic, including the numerical values, the related attributes which are also statistics, and the type of statistic as a codable concept [8].
• The OrderedDistribution Resource supports expression of a statistical array [9]. • The Evidence Resource supports expression of the statistics for a distinct combination of variables and the certainty of the interpretation of the statistics [10].
On March 30, 2020, we started the COVID-19 Knowledge Accelerator (COKA) and by July had more than 150 working meetings with more than 40 active participants from more than 25 organizations from academia, industry, government, and nonprofits in 7 countries [26]. The COKA developed 10 active working groups meeting virtually 12 times per week. COKA efforts revised the FHIR Statistic Resource to include expressions of the statistical model. COKA efforts also created two more FHIR StructureDefinition resources: Citation Resource to support exchange of about 100 elements used to identify articles referenced for scientific reporting [27], and EvidenceReport Resource to support compositions of all the other resources in many combinations [28].
Across the six FHIR resources maintained by the EBMonFHIR/COKA efforts, there were more than 30 elements that would benefit from the use of standardized encoded concepts. Some concepts can be expressed with commonly used code systems such as SNOMED CT® [29], RxNorm [30], and LOINC® [31]. However, we discovered many situations where we could not find a comprehensive code system that was functionally applicable for the concepts commonly communicated.

Development of code system development protocol
We initially developed code systems [32] with pragmatic approaches by using codable concepts found in other code systems where available (such as the STATistics Ontology [STATO] [33] and National Cancer Institute thesaurus [NCIt] [34]) and developing mnemonic codes for terms commonly used by the EBMonFHIR and COKA participants. Though functional for the growing but small community, the desire for interoperability with many related communities included those represented in the HL7 CDS, Clinical Quality Information, BRR, and Vocabulary Work Groups. This demanded development of methods to support open, multinational, multidisciplinary input; comprehensive attention to existing ontologies; global consensus development; and sustainability planning.
Through multiple open virtual web meetings and shared documents, we developed a Code System Development Protocol (full protocol in Appendix A, related image in Fig. 1) which includes 13 steps [35]: 1) Assemble an expert working group. 2) Identify tools or systems commonly used today to express relevant concepts. 3) Map out a single list of non-redundant concepts to support common uses. 4) Identify existing ontologies that are openly available without restrictions. 5) Map related terms and definitions across the ontologies. 6) Define preferred terms, alternative terms, and definitions for the new code system. 7) Identify code system entries with universal agreement by the expert working group. 8) Deliberate suggested changes and reach universal agreement for code system entries where possible. 9) Deliberate unresolved disagreements and reach at least 80% agreement for code system entries where possible. 10) Determine the relative contribution of ontologies to the code system and seek further collaboration for heavily used ontologies. 11) Publish the initial version of the new code system. 12) Evaluate implementation of the code system and refine the system as needed. 13) Maintain continued support to adjust the code system based on changes in the prior 12 steps.

Scope setting
We selected four domains for initial application of the Code System Development Protocol and defined them as [35]: • "The Statistic Type Code System will be used to precisely classify univariate statistics (such as mean, median, and proportion), comparative statistics (such as relative risk, mean difference, and odds ratio), and statistic attribute estimates (such as confidence interval, p value, and measures of heterogeneity). Consistent reporting across systems will facilitate interoperability for science communication. • The Statistic Model Code System will precisely communicate characteristics that define the model used for a statistic. Science reports often do not convey complete information about statistical models. Model characteristics may include concepts such as fixed-effects analysis, linear regression, and Mantel-Haenszel method for pooling. Consistent reporting of statistical models will facilitate interoperability for science communication. • The Study Design Code System will be used to precisely describe methodology characteristics of scientific observations including exposure introduction (such as interventional or observational), cohort definition (such as parallel, crossover or case-control), and group assignments (such as block randomization, every-other quasirandomization, or non-randomized). Consistent reporting of research study design across systems will facilitate interoperability for science communication. • The Risk of Bias Code System will be used to precisely describe concerns with methods or reporting of scientific observations including selection bias (such as gaps in randomization or allocation concealment), performance bias (such as gaps in blinding), and analysis bias (such as gaps in intention to treat analysis or selective analysis reporting). Consistent reporting of risk of bias across systems will facilitate interoperability for science communication."

Step 1: Assemble an expert working group
For Step 1, we developed an Invitation to Join an Expert Working Group for any of the four code systems (Statistic Type, Statistic Model, Study Design, Risk of Bias). Joining the group was open to anyone and group members could self-identify their expertise. Relevant expertise for a code system could include without limitation experience evaluating or expressing the concepts to be included in the code system, either for human interpretation or for machine interpretation.
We shared the invitations through multiple communities (mostly via email distribution lists) including the COKA Initiative, COVID-END, the evidence-based healthcare (EBH) listserv, Grading of Recommendations, Assessment, Development and Evaluation (GRADE) Working Group, the Developing and Evaluation Communication strategies to support Informed Decisions and practice based on Evidence (DECIDE) project participants, the AHRQ evidence-based practice centers (EPCs), the HL7 CDS and BRR work groups, the Society for Clinical Trials, the Society for Participatory Medicine, International Society for Clinical Biostatistics, and Patient-Centered Outcomes Research Institute (PCORI).

Step 2: Identify commonly used tools and systems
For Step 2, we asked Expert Working Group members to identify sources to signal the scope of (or common need for) a code system, namely tools or systems in common current use for reporting concepts relevant to the code system.

Step 3: Create lists of non-redundant concepts
For Step 3, we started with one of the common tools or systems, identified a series of non-redundant concepts for expression to support it, and provided a categorical classification. We then mapped the next identified tool or system, matched concepts where possible, added more concepts where needed, and adjusted the categorical classification. The process was shared openly during weekly Steering Group web meetings and summarized for the Expert Working Group by email distribution lists with open links to the Step 3 mapping spreadsheets.

Time course for initial development
The COVID-19 Knowledge Accelerator consists of 10 active working groups meeting a total of 12 times weekly in open web meetings. Several working groups were developing code systems and the discussions about a common approach started on August 24, 2020. The first draft of a Code System Development Protocol with 11 steps was created on August 28. The protocol was finalized on September 17. Initial efforts were started ahead of wider dissemination of invitations. Invitations to join the expert working groups were sent widely during the week of September 21. All participants were asked to comment by an October 14 cutoff date for communicating the degree of contribution to Step 3 for version 1.0.0 of the code systems.
We report here the results of Steps 1-3 of this effort as of October 14, 2020. These results are not complete in terms of code system development as they do not include definitions or codes and may change through the remaining steps. These remaining steps, and the overall protocol, share and build upon principles and practices in existing ontology development methods [36,37,38]. Key aspects such as reusing existing ontologies, enumerating important terms (ie, concepts) across ontologies, and the overall iterative and agile nature of ontology development are well represented in our code system development protocol. We presented our protocol and preliminary findings in an October 30 Workshop on COVID-19 Ontologies (https://github.com/ CIDO-ontology/WCO). In November of 2020, we met with ontology developers of the STATO and Ontology of Biological and Clinical Statistics (OBCS), both of which are Open Biological and Biomedical Ontologies (OBO) Foundary recognized ontologies. The ontology developers found our work valuable for identifying gaps, alignments, new terms, and other improvements for existing ontologies and potentially for creating an application ontology.

Expert working groups
As of October 10, 2020, a total of 55 people from 26 countries in 6 continents joined an Expert Working Group for up to four code system development efforts (see Table 1 and Appendix B).

Initial results (Step 2 and Step 3)
Twenty-three commonly used tools and systems were applied across the four code systems, ranging from 2 to 12 per code system ( Table 2). There were 368 non-redundant concepts (draft display terms for a code system) identified across the four code systems, ranging from 53 to 170 per code system ( Table 2, Appendices C, D, E and F).

Table 2
Step 2 and Step 3 Results to Inform Code System Development.  Table 1 Demographics of 55 Members of Expert Working Groups.

Progress toward code system development
Coordinating 55 experts from 26 countries to identify 198 concepts for the development of code systems for scientific methodology (statistics and study design) and 170 concepts for the assessment of quality of evidence (risk of bias) is an early step in what is needed to support interoperable data exchange for scientific communication.
Next steps include mapping concepts across ontologies, reaching universal or near-universal agreement for common code systems for data exchange, and continuous adaptation to meet needs discovered in implementation.
The COKA effort will benefit from the newly crafted HL7 Unified Terminology Governance (UTG) process wherein terminology artifacts, such as the code systems and mappings we are creating, are published by HL7 [62]. The UTG approach aligns with our protocol by subjecting the artifacts created to an open comment and review process. The UTG process starts with transforming the code system and concept map terminology content into FHIR code system and concept map artifacts, typically represented in FHIR JSON or XML [63]. Once the content is entered into the UTG environment, it exists as a set of proposed changes to the core HL7 terminology. Those proposals are made available for review and comment within the UTG environment, consistent with steps 12 and 13 of our protocol. Once comments on the proposed artifacts are resolved and voting requirements are met, if approved, the terminology additions are merged into the HL7 terminology environment at terminology.hl7.org, which is updated and made available through a continuous integration process [63]. In this way, updates and improvements for any content can be developed, proposed, reviewed, improved, voted on and released within a documented environment aligned with the American National Standards Institute (ANSI)sectioned HL7 ballot process, and ultimately published as part of the official HL7 terminology content.
Our protocol (step 6) includes entering data into an ontology web editor which by design would include top-level ontology concepts (classes, hierarchy, attributes) such as those represented in the Basic Formal Ontology [64] to help refine the classes and hierarchy. The consideration of the FHIR CodeSystem Resource StructureDefinition [65] in preparation for the UTG approach helped us realize we can represent these top-level ontology concepts as property elements within the CodeSystem Resource and we are currently considering modifying step 6c of our protocol to use FHIR tooling directly instead of a web ontology editor.

Strengths and limitations
Strengths of our approach include a substantial spirit of comradery across many diverse people facing a common challenge, multidisciplinary engagement, and coordination with global systems for standards development. In addition, use of FHIR as the underlying standard provides support from a method demonstrated to meet the interoperability needs for a similarly complex global community.
Limitations include the rapid timeline for development, having processed the initial listing of hundreds of concepts in just a month or so. There will undoubtedly be multiple revisions. The current list does not include outcome-specific statistic types (such as mortality for observed proportion or incidence related to death) or application-specific statistic types (such as recall instead of sensitivity for the application to information retrieval). This approach was purposefully taken to maximize simplicity and flexibility. Also, it is not yet established what resources will be needed to complete and maintain the code systems. For the initial effort, the degree of volunteerism and availability was influenced substantially by COVID-19 and we hope the spirit will continue for application across other domains.

Example for computable evidence
We demonstrate a computable expression of evidence [66] with the results (summary effect estimate) of a meta-analysis of three randomized trials [67,68,69] for the effect of remdesivir on 14-day mortality in patients with COVID-19 pneumonia. This example includes 43 instances of a "coding" element to express codable concepts with a "system" element to denote the code system, a "code" element to denote the specific code, and a "display" element for human-readable interpretation of the code. For example, the JSON includes (see Table 3): This example of computable evidence uses existing codes in published code systems where available, and these may differ from the code systems in development. Where not available, we use "system": "not yet published" and "code": "not yet defined" and this shows the need for creation or extension of code systems. One can search the JSON in this example to find 1 code related to study design ("display": "randomized trial"), 8 codes related to statistic type ("display" values of "Relative Risk", "Confidence Interval", "Z-score", "P-value", "I-squared", "Cochran's Q statistic", "degrees of freedom", and "Tau squared"), 4 codes related to statistic model ("display" values of "Meta-analysis", "Fixed-effects", "Random-effects", and "Dersimonian-Laird method"), and 1 code related to risk of bias ("display": "Lack of blinding"). In this example, the effect estimate is statistically significant using a fixedeffect model and not statistically significant using a random-effects model for the meta-analysis, a situation for which explicit representation of the statistic model is necessary for proper interpretation.

Benefits of code system development
When completed, the code systems will make finding knowledge easier. For example, systematic reviewers may specify study design concepts to facilitate identification of articles meeting their inclusion criteria. The code systems will facilitate re-use of scientific results. For example, clinical trial reporters who express their results for regulatory purposes could re-use the data to express their results for publication, and the systematic reviewers could directly re-use these results without the need for manual data extraction. All of these code systems will expedite recognition of the trustability of scientific knowledge whether seeking the data parameters (as expressed with statistic type codes), the methods for data creation (as expressed with study design and statistic model codes), or the assessments of others (as expressed with risk of bias codes). Someday, via explicit encoded study results, data within published papers can integrate with clinical decision support systems, particularly when reporting meta-analysis results.
We hope the processes, systems, and accomplishments we have produced so far in response to the COVID-19 crisis are sufficient to provide an infrastructure that will endure to make scientific communication accessible for a long time.

Conclusion
We started with efforts to support each other to accelerate knowledge transfer for COVID-19, and then developed solutions with expansive potential. We identified non-redundant concepts to support computable expression of scientific methods. Mapping these concepts to existing ontologies, selecting preferred terms and definitions by the global community, evaluating the implementation of the code systems, and supporting continued development of the systems will support an extensive ecosystem for communicating scientific evidence. More efficient scientific communication will reduce cost and burden and improve health outcomes, quality of life, and patient, caregiver and healthcare professional satisfaction. Anyone who is communicating these concepts may join the effort at https://www.gps.health/covid19_knowledge_acce lerator.html [70].

Declaration of Competing Interest
The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: All authors are members of the COVID-19 Knowledge Accelerator (COKA) Initiative. The COKA Initiative is a volunteer virtual organization with no funding or contractual relations. The non-software content created by the COKA Initiative (including the data shared in this manuscript) is openly and freely available by Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license. BSA is the owner of Computable Publishing LLC which may commercialize software services related to this content. JD and KS are employed by Computable Publishing LLC. MA, VS, AS, IK, and RCM have no conflicts to report.

Acknowledgement
We would like to thank Karen A. Robinson, Harold Lehmann, Zbys Fedorowicz, and Lehana Thabane for substantial contributions to the Code System Development Protocol. We thank Cheow Peng Ooi for assistance with citations. We also acknowledge contributions by Expert Working Group members with the level of contribution provided in Appendix B. We also thank Asiyah Yu Lin, Yongqun "Oliver" He, Jie Zheng, and Philippe Rocca-Sera for providing feedback for the extended value of our work for ontology development.

Funding sources
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Code System Descriptions
The Statistic Type Code System will be used to precisely classify univariate statistics (such as mean, median, and proportion), comparative statistics (such as relative risk, mean difference, and odds ratio), and statistic attribute estimates (such as confidence interval, p value, and measures of heterogeneity). Consistent reporting across systems will facilitate interoperability for science communication.
The Statistic Model Code System will precisely communicate characteristics that define the model used for a statistic. Science reports often do not convey complete information about statistical models. Model characteristics may include concepts such as fixed-effects analysis, linear regression, and Mantel-Haenszel method for pooling. Consistent reporting of statistical models will facilitate interoperability for science communication.
The Study Design Code System will be used to precisely describe methodology characteristics of scientific observations including exposure introduction (such as interventional or observational), cohort definition (such as parallel, crossover or case-control), and group assignments (such as block randomization, every-other quasi-randomization, or non-randomized). Consistent reporting of research study design across systems will facilitate interoperability for science communication.
The Risk of Bias Code System will be used to precisely describe concerns with methods or reporting of scientific observations including selection bias (such as gaps in randomization or allocation concealment), performance bias (such as gaps in blinding), and analysis bias (such as gaps in intention to treat analysis or selective analysis reporting). Consistent reporting of risk of bias across systems will facilitate interoperability for science communication.
Protocol Steps: 1. Assemble an expert working group for each code system. a. Expert working group membership will be open to any individual who self-identifies as a relevant expert for the code system. Relevant expertise for a code system may include but is not limited to experience evaluating or expressing the concepts to be included in the code system, either for human interpretation or for machine interpretation. b. We will post open invitations as email messages to the distribution lists for the COKA Initiative, COVID-END, EBH listserv, GRADE Working Group, DECIDE project participants, AHRQ EPC listserv, HL7 CDS and BRR work groups, the Society for Clinical Trials, the Society for Participatory Medicine, International Society for Clinical Biostatistics, and PCORI. c. With the invitation we will share an introduction to what is a code system, why we are doing this, a link to the protocol, and a link to a data entry form to sign up. Sign up at Code System Development Intake Form. d. The data entry form will include optional demographic questions (age, gender, race/ethnicity) for the sole purpose of reporting demographic distribution of the expert working group in submitted publications of the code system. e. Set up a code system steering group from the most actively engaged participants, specifically those who join open weekly work group meetings. 2. For each code system, identify sources to signal the scope of (or common need for) a code system, namely tools or systems in common current use for reporting the concepts relevant to the code system. Expert working group members will be asked to identify such sources. 3. Create a list of non-redundant concepts that convey the concepts in commonly used tools and systems. a. Categorical classifiers (names of code sets) may be added. (A concept may be a member of a code set.) b. A concept may be marked as "also serves as a categorical classifier" in which case the concept may be a "parent" in one or more IS-A relationships with other concepts. (A name of a code set may be a member of another code set.) c. A concept may be marked as being a "child" in an IS-A relationship with another concept by listing the "parent" concept as a categorical classifier. d. This list will be reviewed in the open work group meetings. 4. Identify ontologies likely to include concepts on the lists created in step 3. Expert working group members will be asked to identify such ontologies. We will limit the effort to ontologies available for use without restrictions (or limited to Category 0 or 1 Restrictions per UMLS Restriction Levels described at https://uts. nlm.nih.gov/help/license/licensecategoryhelp.html). 5. For each concept, from each ontology, extract the display (or preferred term), synonym list (or alternative terms), and definition(s) that best match the concept, and note closely related variations. 6. For each concept: a. Review the displays, synonym lists, and definitions available from ontologies. b. Draft a preferred display, synonym list, and definition, and note matches to the ontologies to measure relative contributions. c. Enter the draft preferred display, synonym list, and definition into an ontology web editor (such as WebProtege). If approved, the dataset can be shared with National Cancer Institute (NCI) Enterprise Vocabulary Services (EVS) for entry in the NCI Thesaurus and exported for use with WebProtege. 7. Each member of the expert working group will, for each concept that will be a code system entry, note agreement (with the draft preferred display, synonym list and definition) or suggest changes. a. For concepts that are "parents" in IS-A relationships, agreement will also be sought that the concept is useful functionally without subordinate coding. b. For concepts that are "children" in IS-A relationships, agreement will also be sought that if the child concept applies then the parent concept must apply AND the parent concept can apply while the child concept does not apply. c. This process will be online and asynchronous. 8. For any concepts without universal agreement we will discuss the suggested changes in open meetings, revise as appropriate, then resend for voting as noted in step #7. 9. If a concept does not achieve universal agreement (cycling through steps 7 and 8 with conflicting suggestions): a. Each person recommending changes will write a rationale. b. The rationales will be shared with the expert working group prior to a group meeting. c. The group meeting will discuss and prepare the preferred version. The preferred version and meeting discussion will be shared with the group. d. Group members will have 48 h to vote for the presented version. e. The preferred version will become the included version if it achieves at least 80% agreement with at least 5 people voting. f. If unable to achieve at least 80% agreement with at least 5 people voting, options may include extending the voting period, dropping the item, or preparing for another group discussion. 10. For the first complete version of the code system with agreement reached for all entries, we will determine the percent contribution from the different ontologies. If an ontology provides >50% contribution across the series of code systems or >75% contribution to a single code system, we may consider deeper collaboration rather than continued maintenance of a new code system. 11. We will publish the code system at terminology.hl7.org and seek publication of introductory articles to the code system in the biomedical literature. 12. For implementation and initial evaluation of the code system: a. Identify tools and systems that could use the code system. b. Offer support for implementation. Measure proportion of systems that get engaged. c. Evaluate ease of use. d. Generate code system change requests as needed. e. Track systems that implement the code system and set a regular review interval to inquire about usefulness and change requests. 13. For ongoing maintenance and development of the code system: a. Maintain an open invitation for code system users to join the expert working group for continued feedback. b. Maintain a method for expert working group members to suggest additional tools or systems with common current use of concepts matching the code system. c. Code system changes may be initiated by change requests from the community. d. The code system steering group will validate that change requests are appropriate for group deliberation (eg, fits the purpose of the code system, has sufficient rationale, avoids duplication). e. Valid change requests will lead to drafting a preferred display, synonym list, and definition. f. Each member of the expert working group will, for each valid change request, note agreement (with the draft preferred display, synonym list and definition) or suggest changes. This process will be online and asynchronous. (step #7) g. For any concepts without universal agreement we will discuss the suggested changes in open meetings, then resend for voting as noted in steps #7 and #8. If not reaching universal agreement, manage as step #9. h. Changes to the code system will be published at terminology.
hl7.org and released as needed.   The list above was the current list as of October 15, 2020. The list continues to evolve and the current list can be found at https://conflue nce.hl7.org/display/CDS/COKA+Code+System+Developme nt+Working+Groups.