Surgical data science – from concepts toward clinical translation

Recent developments in data science in general and machine learning in particular have transformed the way experts envision the future of surgery. Surgical Data Science (SDS) is a new research field that aims to improve the quality of interventional healthcare through the capture, organization, analysis and modeling of data. While an increasing number of data-driven approaches and clinical applications have been studied in the fields of radiological and clinical data science, translational success stories are still lacking in surgery. In this publication, we shed light on the underlying reasons and provide a roadmap for future advances in the field. Based on an international workshop involving leading researchers in the field of SDS, we review current practice, key achievements and initiatives as well as available standards and tools for a number of topics relevant to the field, namely (1) infrastructure for data acquisition, storage and access in the presence of regulatory constraints, (2) data annotation and sharing and (3) data analytics. We further complement this technical perspective with (4) a review of currently available SDS products and the translational progress from academia and (5) a roadmap for faster clinical translation and exploitation of the full potential of SDS, based on an international multi-round Delphi process.


Introduction
More than 15 years ago, in 2004, leading researchers in the field of computer aided surgery (CAS) organized the workshop "OR2020: Operating Room of the Future". Around 100 invited experts including physicians, engineers, and operating room (OR) personnel attended the workshop (Cleary et al., 2004) to define the OR of the future, with 2020 serving as target time frame. Interestingly, many of the problems and challenges identified back in 2004 do not differ substantially from those we are facing today. Already then, researchers articulated the need for "integration of technologies and a common set of standards", "improvements in electronic medical records and access to information in the operating room", as well as "interoperability of equipment". In the context of data-driven approaches, they criticized the lack of an "ontology or standard" for "high-quality surgical informatics systems" and underlined the need for "clear understanding of surgical workflow and modeling tools". Broadly speaking, the field has not made progress as quickly as researchers had hoped for at the time.
More recently, the renaissance of data science techniques in general and deep learning (DL) in particular has given new momentum to the field of CAS. In response to the general artificial intelligence (AI) hype, a consortium of international experts joined forces to discuss the role of data-driven methods for the OR of the future. Based on a workshop held in 2016 in Heidelberg, Germany, the consortium defined Surgical Data Science (SDS) as a scientific discipline with the objective of improving "the quality of interventional healthcare and its value through capture, organization, analysis, and modelling of data" . In this context, "data may pertain to any part of the patient care process (from initial presentation to long-term outcomes), may concern the patient, caregivers, and/or technology used to deliver care, and are analyzed in the context of generic domain-specific knowledge derived from existing evidence, clinical guidelines, current practice patterns, caregiver experience, and patient preferences". Importantly, SDS involves the physical "manipulation of a target anatomical structure to achieve a specified clinical objective during patient care" (Maier-Hein et al., 2018a). In contrast to general biomedical data science, it also includes procedural data as depicted in Fig. 1.
Three years later, in 2019, an international poll revealed that no commonly recognized surgical data science success stories exist to date, while success stories in other fields have been dominating media reports for years, as detailed in Section 2. The purpose of this paper was therefore to go beyond the broad discussion of the potential of SDS by providing an extensive review of the field and identifying concrete measures to pave the way for clinical success stories. The paper is based on an international workshop that took place in June 2019 in Rennes, France, and structured according to core topics discussed at the workshop. In Section 2, we will review the questionnaire that served as the basis for the workshop as well as an international 4-round Delphi process (Hsu and Sandford, 2007) we conducted with 50 clinical and technical stakeholders from 51 institutions to present concrete goals for the future. In the ensuing sections, we will present the current practice, key initiatives and achievements, standards, platforms and tools as well as current challenges and next steps for the main building blocks of SDS, namely technical infrastructure for data acquisition, storage and access (Section 3), methods for data annotation and sharing (Section 4) as well as data analytics (Section 5). A section about achievements, pitfalls and current challenges related to clinical translation of SDS (Section 6) and a discussion of our findings (Section 7) will close the manuscript. While, by definition, SDS encompasses multiple interventional disciplines, such as interventional radiology and gastroenterology, the present paper puts a strong focus on surgery.

Lack of success stories in surgical data science
Machine learning (ML) has begun to revolutionize almost all areas of healthcare. Success stories cover a wide variety of application fields ranging from radiology and dermatology to gastroenterology and mental health applications (Miotto et al., 2018;Topol, 2019). Strikingly, such success stories appear to be lacking in surgery.
The international Surgical Data Science Initiative  was founded in 2015 with the mission to pave the way for AI success stories in surgery. Key result of the first workshop, which was inspired by current open space and think tank formats, was a common definition of SDS  and a thorough description of the challenges in applying AI in interventional healthcare. The second edition of the workshop in 2019 focused on a comprehensive overview of the field including key research initiatives, industrial perspectives and first success stories. Prior to the workshop, the registered participants were asked to fill out a questionnaire, covering various aspects related to SDS. 43% of the 77 participants were professors/academic group leaders (clinical or engineering), while the remaining were mostly either from industry (14%) or PhD students / Postdocs (36%). The majority of participants (61%) agreed that the most important developments since the last workshop in 2016 were related to advances in AI. Notably, however, when participants were asked about the most impressive SDS paper, only a single paper  (the position paper from the first workshop) was mentioned more than twice (primarily by non-co-authors). The majority of participants agreed that the lack of representative annotated data is the main obstacle in the field and the main reason for the failure of previous SDS projects. Also, when referring to their personal experience, 33% associated the main reason of failure of an SDS project with lack of data, followed by underestimation of the problem complexity (29%). EndoVis (28%), Cholec80  (21%) and JIGSAWS  (17%) were mentioned as the most useful publicly available data sets but the small size/limited representativeness of the data set was identified as a core issue (45%).
Based on the replies to the questionnaire and the subsequent workshop discussions, we identified four areas that are essential for moving the field forward: (1) Technical infrastructure for data acquisition, storage and access, (2) data annotation and sharing, (3) data analytics, and (4) aspects related to clinical translation. These are reflected in the four main sections of this paper. We then conducted a Delphi process involving a consortium of 50 medical and technical experts from 51 institutions (see list of co-authors) to formulate a mission statement along with a set of goals that are necessary to accomplish the respective mission (see Table 2, 3, 4 and 7) for each of the four areas. More specifically, the coordinating team of the Delphi process (eight members from five institutions; nonvoting) put forth an initial mission statement and an initial set of goals for each of the four missions based on the workshop discussions. In a 4-round Delphi process, the remaining consortium members then iteratively refined the phrasing of the missions statements and goals and added further proposals for goals. This process yielded a set of 6-9 goals per mission that received support by at least two thirds of the voting members. Finally, the consortium collaboratively compiled a list of relevant stakeholders (Table 1) and then rated their importance for the four missions (Appendix F). To avoid redundancy, the consortium further agreed on the following: Context statement: Unless otherwise specified, in all of the following text, a) surgical data science (SDS) represents the general context of the suggested phrases and b) "data" may pertain to any part of the patient care process (from initial presentation to long-term outcomes), may concern the patient, caregivers and/or technology used to deliver care and must be acquired, stored, and shared in accordance with both local and international regulatory constraints. In general, c) data handling should comply with the FAIR (Findability, Accessibility, Interoperability, and Reuse) principles (Wilkinson et al., 2016) and d) user-friendliness should be a guiding principle in all processes related to data handling. Finally, e) the term SDS stakeholders refers to clinical, research, industrial, regulatory, public and private stakeholders.
Based on the international questionnaire, the on-site workshop and the subsequent Delphi process, the following sections present the perspective of the members of the international data science initiative on the identified key aspects for generating SDS success stories.

Technical infrastructure for data acquisition, storage and access
To date, the application of data science in interventional medicine (e.g. surgery, interventional radiology, endoscopy, radiation therapy) has found comparatively limited attention in the literature. This can partly be attributed to the fact that only a fraction of patient-related data and information is being digitized and stored in a structured manner (Hager et al., 2020) and that doing so is often an infeasible challenge in modern ORs. This section focuses on current hurdles in creating an environment that can record and structure highly heterogeneous surgical data for long-term usage.
Service (NHS). In other countries, equivalents for data protection exist and are related to the legal frameworks of the respective healthcare system.
From an ethico-legal perspective, it is worth noting that companies commonly obtain surgical data either through contracts with individual consulting surgeons, licensing agreements with hospitals or in exchange for discounted pricing of their products. This current practice raises important issues regarding power imbalances and the democratization of data access (August et al., 2021).

Key initiatives and achievements
This section presents prominent SDS initiatives with a specific focus on data acquisition, access and exchange.
Data acquisition: Several industrial and academic initiatives have been proposed to overcome the bottleneck of prospective surgical data acquisition.
The DataLogger (KARL STORZ SE & Co. KG, Tuttlingen, Germany) is a technical platform for synchronously capturing endoscopic video and device data from surgical devices, such as the endoscopic camera, light source, and insufflator . The DataLogger has served as a basis for the development of a Smart Data Platform as part of the InnOPlan project (Roedder et al., 2016) and has been continuously expanded to support an increasing number of medical devices and clinical information systems. It has also been used to collect data for Endoscopic Vision challenges (e.g. EndoVis-Workflow; EndoVis-Workflow and Skill; EndoVis-ROBUST-MIS).
The OR Black Box® (Goldenberg et al., 2017) is a platform that allows healthcare professionals to identify, understand, and mitigate risks that impact patient safety. It combines input from video cameras, microphones, and other sensors with human and automated processing to produce insights that lead to improved efficiency and reduced adverse events. The OR Black Box has been in operation in Canada since 2014, in Europe since 2017 and in the USA since 2019. An early analysis of the OR Black Box use in laparoscopic procedures of over 100 patients has demonstrated that errors and distractions as annotated by experts viewing the procedures took place in every case, and often went unnoticed or were at least not recalled by the surgeon at the time (Jung et al., 2020).
In Strasbourg, France, the Nouvel Hôpital Civil (NHC), the Institut de Recherche contre les Cancers de l'Appareil Digéstif (IRCAD) and the Institut hospitalo-universitaire (IHU) record surgery videos for education purposes and research. These are curated and used mainly for IRCAD's WebSurg (Mutter et al., 2011), a free online reference for video-based surgery training with over 370,000 members. laid important foundations in the shape of a service-oriented communication protocol for the dynamic cross-vendor networking of medical devices and resulted in the International Organization for Standardization (ISO)/Institute of Electrical and Electronics Engineers (IEEE) 11073 Service-oriented Device Connectivity (SDC) series of standards (see Section 3.3). The projects InnOPlan (Roedder et al., 2016) (see paragraph "Data acquisition") and OP 4.1 also used SDC as the basis for device communication. InnOPlan's Smart Data platform enables real-time provision and analysis of medical device data to enable datadriven services in the operating room. The project OP 4.1 aimed at developing a platform for the OR -in analogy to an operating system for smartphones -that allows for integration of new technical solutions via apps.
The project Connected Optimized Network & Data in Operating Rooms (CONDOR) is another collaborative endeavor that aims to build a video-driven Surgical Control Tower (Padoy, 2019; within the new surgical facilities of the IRCAD and IHU Strasbourg hospital by developing a novel video standard and new surgical data analytics tools. A similar initiative is The Operating Room of the Future (ORF) that researches device integration in the OR, workflow process improvement, as well as decision support by combining patient data and OR devices for MIS (Stahl et al., 2005).

Standards, platforms and tools
Standards, platforms and tools have focused on the topics of interoperability as well as data storage and exchange.
3.3.1. Interoperability-Interoperability is defined by IEEE as "the ability of two or more systems or components to exchange information and to use the information that has been exchanged" (IEEE, 1991) or by the Association for the Advancement of Medical Instrumentation (AAMI) as "the ability of medical devices, clinical systems, or their components to communicate in order to safely fulfill an intended purpose" (AAMI, 2012).
Numerous standards have been introduced to provide interoperability including Health Level 7 (HL7), IEEE 11073, American Society for Testing and Materials (ASTM) F2761 (Integrated Clinical Environment (ICE)), DICOM, ISO TC215, European Committee for Standardization (CEN) TC251 and International Electrotechnical Commission (IEC) 62A. Different levels of interoperability can be distinguished, for example through the 7 Level Conceptual Interoperability Model (LCIM) from Tolk et al. (2007), which is defined as follows (Wang et al., 2009):

•
Level 0 -No interoperability: Two systems cannot interoperate.
• Level 1 -Technical interoperability: Two systems have the means to communicate, but neither has a shared understanding of the structure nor meaning of the data communicated. The systems have common physical and transport layers.
• Level 2 -Syntactic interoperability: Two systems communicate using an agreedupon protocol with structure but without any meaning. The systems exchange data using a common format.

•
Level 3 -Semantic interoperability: Two systems communicate with structure and have agreed on the meaning of the exchanged terms. The meaning of only the exchanged data is understood.
• Level 4 -Pragmatic interoperability: Two systems communicate with a shared understanding of data, the relationships between elements of the data, and the context of the data but these systems do not support changing relationships or context over time. The meaning of the exchanged data and the relationships between pieces of information is understood.

•
Level 5 -Dynamic interoperability: Two systems are able to adapt their information models based on changing meaning and context of data over time.
Evolving semantics are understood.
• Level 6 -Conceptual interoperability: Includes the understanding and exchange of complex concepts. Systems are aware of each other's underlying assumptions, models and processes.
The number of interoperability levels varies from model to model and depends on the goal of the intended classification. For example, Lehne et al. (2019) use only four levels, the first two being identical to those listed above; the third, also called "semantic interoperability" addresses the complexities mentioned in levels 3 to 5 here, and the fourth puts forth the concept of "Organisational Interoperability", which includes aspects of level 5 and 6. The following paragraphs use the LCIM to classify the standards of interest to this paper.
(1) Technical interoperability: Modern hospitals typically have sophisticated networks, which makes technical interoperability the most achievable level (Lehne et al., 2019). The main challenge inside the OR, where real-time capability is often critical, is the available bandwidth. An uncompressed Full HD video stream at 60 fps in a color depth of 24 bit requires a bandwidth of 2.98 Gigabit per second (Gbps, not to be confused with Gigabyte per second (GBps), which is eight times larger). Available Ethernet ports typically have a data transfer rate of 1 Gbps. While more modern installations may reach Ethernet data transfer rates of 10 Gbps, this technology is still expensive and typically reserved for networks in data centers. Wireless networks are even slower: Modern devices often support theoretical speeds between 0.45 Gbps and 1.3 Gbps, which results in an effective bandwidth of around 50% of the theoretical limit. The newest Wi-Fi (Wireless Fidelity) 6 standard, released late 2019, increases this theoretical limit to over 10 Gbps under laboratory conditions, but the effective speeds and adoption rate remain to be seen. In general, Wi-Fi suffers from a higher rate of associated uncertainties as well as latency, depending on a number of environment factors. Critically, Wi-Fi packets may get lost if interference between networks is too high, causing latency spikes of potentially several hundreds of milliseconds, which may negatively affect real-time applications. The new 5G standard for wireless communication can potentially ease some of these problems by reaching theoretical speeds of 20 Gbps and avoiding conflicts with other networks since the relevant frequencies are licensed for specific areas. Additionally, 5G as a method of Internet access could enable the transfer of large amounts of data to and from the hospital in relatively short time, something which previously required not readily available fast physical connections like glass fibre. While limitations of available bandwidth can be mitigated by using data compression, importantly, "losses imperceptible to humans" can still impede algorithm performance.
It is worth noting that, especially inside the OR, devices still exist that are entirely unable to connect to networks (from basic technical infrastructure like doors or lights to routine medical equipment like certain anesthesia systems) or are not in the network due to missing capacities (e.g. Ethernet sockets) or software add-ons (e.g. a proprietary application programming interface (API)).
(2) Syntactic interoperability: At this level, the structure of exchanged data is defined with basic semantic information. This level is arguably where most of today's efforts in medical data interoperability take place, and where a number of standards compete. A major player in the standardization is HL7 (Kalra et al., 2005), which has developed standards for the exchange of patient data since 1987. The eponymous HL7 standard has been continuously updated and most notably includes the Version 3 Messaging Standard, which specifies interoperability for health and medical transactions. HL7 has been criticized for the complexity of its implementation (Goldenberg et al., 2017), resulting in the proposal of HL7 Fast Healthcare Interoperability Resources (FHIR). HL7 FHIR simplifies implementation through the use of widely applied web technologies. Another important standard is provided by the openEHR foundation. In contrast to HL7, openEHR is not only a standard for medical data exchange, but an architecture for a data platform that provides tools for data storage and exchange. With this, however, come added complexity and challenges.
HL7 and openEHR provide the broadest scope of medical data exchange, but both build on standards that solve specific subtasks. While a complete listing is out of scope for this article, one notable exception is DICOM, which today is the undisputed standard for the management of medical imaging information. In 2019, DICOM was extended to include real-time video (DICOM Real-Time Video (DICOM-RTV)). This extension is an IP-based DICOM service for transmitting and broadcasting real-time video, with synchronized metadata, to subscribers (e.g. a monitor or SDS application server) with a quality comparable to standard OR video cables.
The previously mentioned standards focus on enabling the exchange of patient-individual data between Hospital Information Systems (HIS). Inside the OR, requirements differ, since a host of devices create a real-time data stream that focuses on sensoric input instead of direct patient information (diagnosis, habits, morbidity). Accordingly, data exchange standards inside the OR are geared toward these data types. OpenIGTLink (Tokuda et al., 2009), for example, started as a communication protocol for Image Guided Therapy (IGT) applications. Today, OpenIGTLink has been expanded to exchange arbitrary types of data by providing a general framework for data communication. However, it does not define broad standards for the data format, instead relying on users to implement details according to their needs. Through this model, OpenIGTLink enabled data exchange inside the OR long before broad standards were feasible. Similarly, for the field of robotics, the Robot Operating System (ROS) (Koubaa, 2016) has been proposed.
More recent efforts by the OR.NET initiative (see Section 3.2) produced the IEEE 11073 SDC ISO standard which provides a means for general data and command exchange for devices and enables users to control devices in the OR. Standards less specific to the healthcare environment are also available. Similar to OpenIGTLink, The Internet of Things (IoT), for example, defines a standard for device communication without defining standards for the communicated data. While it has been used for data exchange between information systems (Xie et al., 2018), and between devices in the OR (Miladinovic and Schefer-Wenzl, 2018), it has elicited mixed reactions.
(3) Semantic interoperability: This is the domain of clinical nomenclatures, terminologies and ontologies. While modern standards like HL7 FHIR and openEHR already define basic semantics in data exchange, extending these annotations to more powerful nomenclatures like SNOMED CT (Systematized Nomenclature of Medicine -Clinical Terms) (Cornet and de Keizer, 2008) (see Section 4) enables systems to not only share data, but also their exact meaning and scope (i.e. what kind of data exactly falls under the given definition). To illustrate the difference between this level and the previous: HL7 FHIR defines less than 200 healthcare concepts (i.e. terms with a welldefined meaning) (Bender and Sartipi, 2013), while SNOMED CT defines more than 340,000 concepts (Miñarro-Giménez et al., 2019). Today, semantic interoperability is largely defined by terminologies (systematic lists of vocabulary), ontologies (definitions of concepts and categories along with their relationships) and taxonomies (classifications of entities, especially organisms) -the borders between which are often fluid. Standard languages such as the Resource Description Framework (RDF), Resource Description Framework Schema (RDFS) and the Web Ontology Language (OWL) (Bechhofer, 2009) have been defined by the World Wide Web Consortium (W3C), guaranteeing interoperability between ontology resources and data sets based on these ontologies. The aforementioned SNOMED CT is arguably the most complete terminology, spanning the whole field of clinical terms with a wide set of available translations. However, specialized alternatives may perform better on their respective field. Additionally, a host of medical ontologies are available. Most notable is the family of ontologies gathered under the OpenBiological and Biomedical Ontologies (OBO) Foundry (Smith et al., 2007), which cover a wide array of topics from the biomedical domain and share the Basic Formal Ontology (BFO) (Grenon and Smith, 2004) as a common top-level ontology. Intraoperatively, the OntoSPM  provides terminology for the annotation of intraoperative processes, and has spawned efforts for the annotation of binary data (Katić et al., 2017). Common to all these efforts is that they serve best in combination with a standard addressing syntactic interoperability, where they can add semantic information to the data exchange. Semantic interoperability goes hand in hand with data annotation, and is expanded upon in Section 4.
It is important to note that semantic interoperability does not guarantee the availability of data. If two hospitals have agreed on a detailed semantic model but record different parameters for a specific procedures, then the two resulting data sets will contain welldefined but empty fields. Two avoid this, it is necessary to agree on lists of recorded parameters, e.g. in the form of CDE.
(4) Pragmatic interoperability: In order to define context, additional modeling is required to capture data context and involved processes. This can in part be achieved by extending modeling efforts from the semantic interoperability level to include these concepts. Furthermore, efforts to formalize the exchange processes themselves are required. In IEEE 11073 descriptions for architecture and protocol (IEEE 11073-20701) and in HL7 the IHE Patient Care Device (PCD) implementation guide and the conformance model are provided.
For the remaining two levels, developments are more recent and less formalized. For Level (5) Dynamic interoperability, it is required to model how the meaning of data changes over time. This can range from simple state changes (planned operations becoming realized, proposed changes becoming effective) to new data types being introduced and old data types changing meaning or being deprecated. In IEEE 11073 the participant key purposes and in HL7 the workflow descriptions are created for supporting these aspects. Finally, Level (6) Conceptual interoperability allows for exchanging and understanding complex concepts. This requires a means to share the conceptual model of the system, its processes, state, architecture and use cases. This level can be achieved through defining use cases and profiles (e.g. IHE Services-oriented Device Point-of-care Interoperability (SDPi) Profiles) and/or provisioning reference architecture and frameworks.
3.3.2. Data storage and distribution-While current standards have focused on data exchange, they typically do not address data distribution and storage. Typically, data is exchanged between two defined endpoints (e.g. a tracking device and an IGT application, or a computed tomography (CT) scanner and a PACS). To achieve a system that can be dynamically expanded with regard to its communication capabilities, it is necessary to implement messaging technology. Such tools allow arbitrary devices to take part in communication by registering via a message broker, where messages can typically be filtered by their origin, type, destination, for instance. Examples include Apache Kafka (Kim et al., 2017;Spangenberg et al., 2018) or RabbitMQ® (Ongenae et al., 2016;Trinkūnas et al., 2018). Such systems enable developers to create flexible data exchange architectures using technologies that are mature and usually well documented thanks to their wide application outside the field of healthcare. However, they also create a level of indirection which introduces additional delay (which may be negligible with only a few milliseconds in local networks, or significant with several tens or even hundreds of milliseconds over the Internet or wireless networks).
Finally, recording of the exchanged data requires distinct solutions as well. Highperformance, high-reliability databases form an essential requirement for many modern businesses. Thanks to this demand, a large body of established techniques exists, from which users can select the right tool for their specific needs. Binary medical data (images, videos, etc.) can be stored on premise in modern PACSs, which provide extensive support for data annotation, storage and exchange. For clinical metadata, the selection of technology typically depends on the level of standardization of the recorded data. Highly standardized data can possibly be stored directly through interfaces of e.g. the IHE family of standards.
If the target data are not standardized, but homogeneous, then a database model for classical database languages (e.g. Structured Query Language (SQL)) may be suitable. Use cases where a wide array of highly heterogeneous data is recorded may choose modern NoSQL databases. These databases do not (or not exclusively) rely on classical tabular data models, but instead allow the storage and querying of tree-like structures. The JavaScript Object Notation (JSON) Format is a popular choice for NoSQL databases for its wide support in toolkits and the immediate applicability with regard to Representational State Transfer (REST)-APIs. While initially applications of these databases were geared toward data lakes because of the relative ease of application, NoSQL databases have recently seen widespread application in big data and ML (Dasgupta, 2018). A notable example is Elasticsearch (Elastic NV, Amsterdam, the Netherlands), which has achieved widespread distribution and is ranked among the most used search servers (DB-Engines, 2020).
Through the rising relevance of web technology, storing data in the cloud is increasingly becoming a viable option. A vast array of services are available and have been applied in the medical domain (e.g. Amazon Web Services (AWS) (Holmgren and Adler-Milstein, 2017), Microsoft Azure (Hussain et al., 2013), and others). Storing data in the cloud has the potential to save money on HIT by eliminating the need to reduce the locally required storage capacity and maintenance personnel, but brings with it privacy concerns and slower local access to data than from local networks, which may be noticeable especially for large binary data like medical images and video streams. While data privacy options are available for all major services, the implementing personnel have to understand these options and align with them the privacy needs of the institution and the respective data. Since answering these questions is complex, the privacy requirements strict, and the consequences for failing to comply with the law severe, the created solutions are often conservative in nature with regard to privacy. Additionally, downloading large data sets may be costly, as in general, cloud storage providers incentivize performing computations in the cloud.
Finally, solutions to facilitate local storage have been proposed. Commercially available systems such as SCENARA®.STORE (KARL STORZ) compress surgical images and video data over time to decrease storage needs. Alternatively, SDS tools can be used to selectively store critical video sequences instead of entire procedural videos, as recently proposed (Mascagni et al., 2021b).

Current challenges and next steps
The infrastructure-related mission as well as the corresponding goals generated by the consortium as part of the Delphi process are provided in Table 2. This section elaborates on some of the most fundamental aspects: How to enable prospective capturing and storing of relevant perioperative data? (goals 1.1/1.2): A major challenge we face is to capture all relevant perioperative data. While several initiatives and standards are already dedicated to this problem, a particular focus should be put on the recording and integration of patient outcome measures, including measures that need to be captured long after the patient has left the hospital (e.g. 5-year-survival). The field of SDS stands in contrast to the field of radiology, where the DICOM standard now covers the exchange of medical images and related data. This standard can be seen as a direct result of market pressure: Early medical imaging devices did not prioritize communication standards, instead relying on manufacturer-supplied software specific to the hardware purchased. This behaviour did not change until PACSs became widespread, providing specialized software that offered a benefit to clinical workflows, and the ability to transmit images to them became a driving requirement for the purchase of new imaging hardware. However, the previously mentioned domain complexity also affects standard development. For example, the DICOM specification document alone consists of 6,864 pages 2 , indicating the effort to develop and maintain such a standard. Evolving standards for the exchange of medical data like IEEE 11073 SDC and HL7 FHIR are a step in the right direction, but in order to create a driving force, incentivizing the industry to enable widespread interconnection appears useful.
Storing acquired data is, in theory, largely possible with modern technologies. Missing, however, are standards for storage format, duration and data quality. These should be developed with the involvement of industrial stakeholders and the respective clinical/ technical societies and should specifically include recommendations with respect to minimum standards for storage and annotation. The international Society of American Gastrointestinal and Endoscopic Surgeons (SAGES), for example, created an AI task force with the mission to propose and establish best practices for structured video data acquisition and storage, including recommendations for resolution and compression (Feldman et al., 2020). Generally speaking, a clear distribution of roles between different stake-holders, particularly regarding who takes the initiative, as well as a clear definition of the subject matter to be standardized are now needed.
How to link data from different sources and sites? (goal 1.3)-The need for exchanging data between different sources and sites calls for semantic operability (Section 3.3): Simply storing all data in a data lake without sufficient metadata management poses the risk of creating a data swamp that makes data extraction hard to impossible (Hai et al., 2016). Data distribution among several systems is a healthy approach since it reduces load on a single system and enables engineers to choose the system best suited for the specific types of data stored within. As long as metadata models März et al., 2015;Soualmia and Charlet, 2016) exist that are able to sufficiently describe the data and where to find them, retrieval will be possible through querying the model. Accordingly, efforts should focus on enhancing current clinical information infrastructures from the level of syntactic operability to semantic interoperability. Metadata also becomes essential for data sharing. An increasingly popular approach to data sharing is federated learning (Konečný et al., 2016;Rieke et al., 2020). Instead of sharing data between institutions, the training of algorithms is distributed among participants. While this presumably reduces the ethical and legal complications associated with large-scale data sharing, it is still necessary to achieve semantic interoperability, and the regulatory issues regarding the exchange of models that contain encoded patient data are not fully understood yet.
How to perceive relevant tissue properties dynamically? (goal 1.4)-Surgical imaging modalities should provide discrimination of local tissue with a high contrast-to-noise-ratio, should be quantitative and digital, ideally be radiation-and contrast agent-free, enable fast image acquisition and be easy to integrate into the clinical workflow. The approach of registering 3D medical image data sets to the current patient anatomy for augmented reality visualization of subsurface anatomical details has proven ill-suited for handling tissue dynamics such as perfusion or oxygenation (e.g. for ischemia detection). The emerging field of biophotonics refers to techniques that take advantage of the fact that different tissue components feature unique optical properties for each wavelength. Specifically, spectral imaging uses multiple bands across the electromagnetic spectrum (Clancy et al., 2020) to extract relevant information on tissue morphology, function and pathology (see e.g. Wirkert et al. (2016); Moccia et al. (2018); Ayala et al. (2021)).
Benefiting from a lack of ionizing radiation, low hardware complexity and easy integrability into the surgical workflow, spectral imaging could be leveraged to inform surgical operators directly or be used for the generation of relevant input for SDS algorithms (Mascagni et al., 2018). Open research questions are, among others, related to reproducibility of measurements, possible confounders in the data (Dietrich et al., 2021), inter-patient variability and the robust quantification of tissue parameters in clinical settings.
How to enable real-time inference in interventional settings? (goal 1.5)-While processing times of several seconds or even minutes may be acceptable in some scenarios, other SDS applications, such as autonomous robotics, require real-time inference. Real-time inference requires a number of complex prerequisites to be fulfilled. Relevant data needs to be streamed to a common end point where it can be processed; data streams need to be sufficiently formalized to enable fully automatic decoding; the hardware and networks receiving these streams must be sufficiently fast to decode the streams with minimal latency and high resilience, and the algorithms that provide inference need to be implemented efficiently and run on sufficiently fast hardware to enable real-time execution. If additional data (e.g. preoperative imaging, patient-specific data) is required, the algorithms need to be able to access this data, and inferred information needs to be relayed to the OR team in an adequate manner. These problems can potentially be addressed in a variety of ways, however, it seems prudent to integrate the necessary infrastructure (acquisition, computation, display) directly on site in or near the OR. In a first step, test environments such as experimental operating rooms can serve as platforms where technical concepts for real-time interference can be developed, validated and evaluated in a realistic setting.
How to overcome regulatory and political hurdles? (goal 1.6)-Timelines and associated costs of data privacy management (discussed further in Section 4.4) and regulatory processes need to be supported in both academic and commercial projects: Academic work requires funding and appropriate provision for delays in the project timeline. Notably, the COVID-19 pandemic may have stimulated rapid response from both academic and regulatory bodies in response to urgent needs, and perhaps some of this expedience will remain (examples in Continuous Positive Airway Pressure (CPAP) devices such as UCL-Ventura). Industry also needs to allocate costs, adhere and maintain standards, cover liability and have clear expectations on the required resources. While these processes are well developed and supported in large organizations, smaller companies, in particular startups, have less capacities for them at their disposal. A variety of additional standards would also need to be met since a prospective SDS system approaches a medical device as defined by The U.S. Food and Drug Administration (FDA) (USA) or the Medical Device Regulation (MDR) (EU). These may be ISO-certified or require audits and approval from regulatory agencies and notified bodies, compliance with data protection regulations (e.g. GDPR), more stringent (cyber-)security features and testing adherence. As the field of AI and its regulation is increasingly discussed in public venues, political visibility is rising. By clearly identifying the limiting effects of insufficient infrastructure on the one hand, and potential benefits of improving it on the other, it should become possible to convince political and clinical stakeholders that an investment in HIT as well as dedicated data management and processing personnel is key to exploiting the potential of AI for interventional healthcare. Furthermore, industrial engagement in creating the necessary infrastructure needs to be fostered within the boundaries of global standardization while considering the specific market needs. Healthcare institutions thus need to engage globally with industry to put forth common standards and processes enabling SDS applications compatible with strategic business needs. Of note, existing infrastructures can be leveraged and enhanced in this process. The SDS community should be aware of the complexity of the topic and the messages that are publicized (i.e. premature success stories) and create constructive proposals with realistic outlooks on potential benefits, focusing on long-term investments with the potential to drive change. Specifically, market studies could identify for each individual stakeholder the benefits of SDS solutions compared to their expected costs. Consider for instance a "number needed to treat" type of example, where for every X number of patients for which data insights are applied, one complication costing USD Y may be avoided. By providing estimated returns on investment for improvements to clinical delivery based on reducing person-hours, complications, or duplicative work, such studies would in turn provide key arguments for future investments.
Overall, local and international collaborations and partnerships involving clinical, patient, academic, industry and political stakeholders are needed (see Table 1). Policies and procedures regarding data governance within an institution have to be defined that involve all stakeholders within the SDS data lifecycle. Already existing multinational political entities or governing bodies, as exemplified by the EU, can be leveraged in a first step toward international collaboration and standardization. When implementing the goals put forth in Table 2, internationally agreed standards should be respected. These include, but are not limited to, ethical guidelines. In fact, the World Health Organization (WHO) recently put forth a guidance document on Ethics & Governance of Artificial Intelligence for Health (WHO, 2021), which was compiled by a multidisciplinary team of experts from the fields of ethics, digital technology, law and human rights, as well as experts from Ministries of Health. The report identifies the ethical challenges and risks associated with the use of AI in healthcare and puts forth several internationally agreed on best practices for both the public and the private sector.

Data annotation and sharing
The access to annotated data is one of the most important pre-requisites for SDS. There are different requirements that impact the quality of the annotated data sets. Ideally, they should include multiple centers to capture possible variations using defined protocols regarding acquisition and annotation, preferably linked to patient outcome. In addition, the data set has to be representative for the task to be solved and combined with well-defined criteria for validation and replication of results. Broadly, the key considerations when generating an annotated data set include reliability, accuracy, efficiency, scalability, cost, representativeness and correct specification.

Current practice
A comprehensive list of available curated data sets that are relevant to the field of SDS is provided in Appendix A. In general, they serve as a good starting point, but are still relatively small, often tied to a single institution, and extremely diverse in structure, nomenclature, and target procedure.
Surgical data such as video involves diverse annotations with different granularity depending on the clinical use case to be solved. It can be distinguished between spatial, temporal or spatio-temporal annotations. Examples for spatial annotations include image-level classification (e.g. what tissue/tools/events are visible in an image), semantic segmentation (e.g. which pixels belong to which tissue/tools/events in an image) and numerical regression (e.g. what is the tissue oxygenation at a certain location). Temporal annotations involve the surgical workflow and can have different levels of granularity, e.g. surgical phases at the highest level, which consist of several steps, which are in turn composed of activities such as suturing or knot-tying (Lalys and Jannin, 2014). In addition, specific events such as complications, performance or quality assessment of specific tasks complement temporal annotations. Spatio-temporal annotations involve both spatial and temporal information.
While simple annotation tasks such as labeling surgical instruments may be accomplished by non-experts (Maier-Hein et al., 2014), more complex tasks such as tissue labeling or quality assessment of anastomoses most likely require domain experts.
The major bottleneck for data annotation in surgical applications is access to expert knowledge. Reducing the annotation effort is therefore of utmost importance, and various methods have been proposed. Crowdsourcing  has proven to be a successful method, but designing the task such that non-experts are able to provide meaningful annotations is still one of the biggest challenges. Recently, active learning approaches that determine which unlabeled data points would provide the most information and thus reduce the annotation effort to these samples have been proposed (Bodenstedt et al., 2019a). Similarly, error detection methods reduce the annotation effort to erroneous samples only (Lecuyer et al., 2020). Data can also be annotated directly during acquisition (Padoy et al., 2012;Sigma Surgical Corporation).

Key initiatives and achievements
One of the most successful initiatives fostering access to open data sets is Grand Challenge which provides infrastructure and tools for organizing challenges in the context of biomedical image analysis. The platform hosts several challenges including data sets and also serves as a framework for end-to-end development of ML solutions. Notably, the Endoscopic Vision Challenge EndoVis, an initiative that takes place at the international conference hosted by the Medical Image Computing and Computer Assisted Intervention (MICCAI) Society, is the largest source of SDS data collections (Bernal et al.,  The importance of public data sets in general is illustrated through new journals dedicated to only publishing high quality data sets, such as Nature Scientific Data. An important contribution in this context are the FAIR data principles (Wilkinson et al., 2016), already introduced in the context statement above. Recently, the Journal of the American Medical Association (JAMA) Surgery partnered with the Surgical Outcomes Club and launched a series consisting of statistical methodology articles and a checklist that aims to elevate the science of surgical database research (Haider et al., 2018). It also includes an overview of the most prominent surgical registries and databases, e.g. the National Cancer Database (Merkow et al., 2018), the National Trauma Data Bank (Hashmi et al., 2018) or the National Surgical Quality Improvement Program (Raval and Pawlik, 2018).
Annotation of data sets requires consistent ontologies for SDS. The OntoSPM project (Gibaud et al., 2014) is the first initiative whose goal is to focus on the modeling of the entities of surgical process models, as well as the derivation LapOntoSPM (Katić et al., 2016a) for laparoscopic surgery. OntoSPM is now organized as a collaborative action associating a dozen research institutions in Europe, with the primary goal of specifying a core ontology of surgical processes, thus gathering the basic vocabulary to describe surgical actions, instruments, actors, and their roles. An important endeavor that builds upon current initiatives was recently initiated by SAGES, which hosted an international consensus conference on video annotation for surgical AI. The goal was to define standards for surgical video annotation based on different working groups regarding temporal models, actions and tasks, tissue characteristics and general anatomy as well as software and data structure .

Standards, platforms and tools
In SDS, images or video are typically the main data sources since they are ubiquitous and can be used to capture information at different granularities ranging from cameras observing the whole interventional room or suite to cameras inserted into the body endoscopically or observing specific sites through a microscope (Chadebecq et al., 2020). Different image/ video annotation tools regarding spatial, temporal and spatio-temporal annotations already exist (Table C.1), but to date no gold standard framework enabling different annotation types combined with AI-assisted annotation methods exists in the field of SDS.
Consistent annotation requires well-defined standards and protocols taking different clinical applications into account. Current initiatives are working on the topic of standardized annotation, but no widely accepted standards have resulted from the efforts yet. Notable exceptions can be seen in the fields of skill assessment, where annotations have been required for a long time to rate students and can serve as an example for different kinds of SDS an notation protocols , and in cholecystectomy, where methods for consistent assessment of photos (Sanford and Strasberg, 2014) and videos (Mascagni et al., 2020a) of the Critical View of Safety (CVS) were developed to favour documentation of this important safety step.
Data annotation also requires a consistent vocabulary, preferable modeled as ontology (Section 3). Several relevant ontologies with potential use in surgery such as the Foundational Model of Anatomy (FMA), SNOMED CT or RadLex (Langlotz, 2006) are already available. Existing initiatives like the OBO Foundry project that focuses on biology and biomedicine provide further evidence that building and sharing interoperable ontologies stimulate data sharing within a domain. In biomedical imaging, ontologies have been successfully used to promote interoperability and sharing of heterogeneous data through consistent tagging (Gibaud et al., 2011;Smith et al., 2015).
The challenges and needs for gathering large-scale, representative and high-quality annotated data sets are certainly not limited to SDS. In response, a new industry branch has emerged offering online data set annotation services through large organized human workforces. A listing of the major companies is provided in Table C.2. Interestingly, the market was estimated to grow to more than USD 1 billion by 2023 in 2019 (Cognilytica, 2019), but the consecutive annual report estimates the market to grow to more than USD 4.1 billion by 2024 in 2020 (Cognilytica, 2020). Most companies recruit non-specialists who can perform conceptually simple tasks on image and video data, such as urban scene segmentation and pedestrian detection for autonomous driving. Recently, several companies such as Telus International (Vancouver, BC, CA) and Edgecase AI LLC (Hingham, MA, US) have started offering medical annotation services performed by networks of medical professionals. However, it is unclear to what extent medical image data annotation can be effectively outsourced to such companies, particularly in the case of surgical data, where important context information may be lost. Furthermore, the associated costs of medical professionals as annotators and annotation reviewers for quality assurance may render these services out of reach for many academic institutes and small companies.

Current challenges and next steps
The data annotation-related mission as well as corresponding goals generated by the consortium are provided in Table 3. This section elaborates on some of the most fundamental aspects: How to develop standardized ontologies for surgical data science? (goal 2.1) -As current practices and standards differ greatly between different countries, clinical sites, and healthcare professionals, publicly available surgical data sets generally display vast variation in terms of their annotations. The field, however, is in need of standardized annotations based on a common vocabulary which can be achieved by shared ontologies. For example, evaluating the efficacy of a particular procedure requires a standardized definition and nomenclature for the different hierarchy levels, e.g. the phases, steps/tasks and activities/ actions. A standardized nomenclature along with specifics such as beginning and end of temporal events does not exist yet. Studies can help standardize these definitions and reach a consensus. This is for instance demonstrated by Kaijser et al. (2018) who conducted a Delphi consensus study to standardize the definitions of crucial steps in the common procedures of gastric bypass and sleeve gastrectomy. Such processes could be adopted for other domains, with the Delphi methods being a particularly useful tool to agree on terminology. Once available and broadly adopted, a shared ontology would stimulate the community as well as boost data and knowledge exchange in the entire domain of SDS.
Less formal options such as terminologies are also an alternative but may risk to reach some limits in the long term.
How to account for biases? (goal 2.2)-Various sources and types of bias with potential relevance to SDS have been identified in the past (Ho and Beyan, 2020). Among the most critical are selection bias and confounding bias. Selection bias, also called sample bias, refers to a selection of contributing data in a way that does not allow for proper randomization or representativeness to be achieved. Crucially, in the context of SDS, representativeness refers to numerous factors including variances related to patients (e.g. age, gender, origin), the surgical procedure (e.g. adverse events), input data (e.g. device type, protocol; preprocessing methods), and surgeons (e.g. level of expertise). Creating a fully representative data set is thus highly challenging and only possible in a multi-center setting. Unrepresentative data, on the other hand, leads to biased algorithms. A recent study published in the context of radiological data science (Larrazabal et al., 2020), for example, showed that the performance of AI algorithms for a specific sex (e.g. female) crucially depends on the ratio of samples from the respective sex in the training data set. Another source of overestimation regarding algorithm performance is confounding bias.
Confounding "arises when variables that are not mediators of the effect under study, and that can explain part or all of the observed association between the study exposure and the outcome, are not measured and controlled for during study design or analysis" (Arah, 2017). Recent work in biomedical image analysis (Badgeley et al., 2019;Roberts et al., 2021;Dietrich et al., 2021) showed that knowledge of confounding variables is crucial to the development of successful predictive models. Conversely, a striking recent example of a confounder rendering results meaningless can be seen in the many papers using a particular pneumonia data set as a control group in the development of COVID-19 detection and prognostication models. Since this data set solely consists of young paediatric patients, any model using adult COVID-19 patients and these patients as a control group would likely overperform merely by detecting children (Roberts et al., 2021). Other examples of confounders (also called hidden variables) are chest drains and skin markings in the context of pneumothorax (Oakden-Rayner et al., 2020) and melanoma diagnosis (Winkler et al., 2019). Recognizing and minimizing potential biases in SDS by enhancing data sets with, for example, relevant metadata is thus of eminent importance.
How to make data annotation more efficient? (goal 2.3)-Overcoming the lack of experienced observers might be possible through embedding clinical data annotation in the education and curricula of medical students. In fact, early evidence suggests that annotating surgical skills during video-based training improves the learning experience (De La Garza et al., 2019). The annotation process could also involve several stages, starting with annotations by non-experts that are reviewed by experts. In a similar fashion, active learning methods reduce the annotation effort to the most uncertain samples (Bodenstedt et al., 2019a;Maier-Hein et al., 2016). An alternative approach to overcome the lack of annotated data sets is to generate realistic synthetic data based on simulations. A challenge in this context is to bridge the domain gap, so that models trained on synthetic data generalize well to real data. Promising approaches already studied in the context of SDS are for example generative adversarial networks (GANs) for image-to-image translation of laparoscopic images (Pfeiffer et al., 2019;Rivoir et al., 2021) or transfer learning-based methods for physiological parameter estimation (Wirkert et al., 2017). In the context of photoacoustic imaging, recent work has further explored the GAN-based generation of plausible tissue geometries from available imaging data .
How to establish common standards, protocols and best practices for qualityassured data annotation? (goals 2.3-2.6/2.9)-Standardized open-source protocols that include well-defined guidelines for data annotation are needed to provide accurate labels. Ideally, the annotations should be generated by multiple observers and the protocol should be defined to reduce inter-observer variability and bias. A recent study in the context of CT image annotation concluded that more than three annotators might be necessary to establish a reference standard (Joskowicz et al., 2019). Comprehensive labeling guides and extensive training are necessary to ensure consistent annotation. Shankar et al. (2020), for example, proposed a 400-page labeling guide in the context of ImageNet annotations to reduce common human failure modes such as fine-grained distinction of classes. In SDS, a protocol with check-lists and examples on how to consistently segment hepatocystic anatomy and assess the CVS in laparoscopic cholecystectomy was recently published to favour reproducibility and trust in the clinical relevance of annotations (Mascagni et al., 2021a). Such detailed annotation protocols and extensive user training supported by adequate training material are now required. However, establishing annotation guides for surgical video data is a particularly challenging task since it involves complex actions that require understanding of the surgical intent based on visual cues. In particular, temporal annotations such as phase transitions are often challenging as the start and end of a specific phase is hard to define. Ward et al. (2021) provide a comprehensive list regarding challenges associated with surgical video annotation. Taking into account the variety of surgical techniques this may lead to annotation inconsistencies even amongst experts, but these could also be used as a hint to estimate the difficulty associated with a surgical situation . In this context, research on the needs with respect to data and annotation quality in the context of the clinical goals is also required. As data sets and annotations evolve over time, another aspect to be taken into account involves versioning of data sets and annotations, similar to code, which is a non-trivial task (Marzahl et al., 2021). For all tasks related to data annotation, it will be prudent to establish and enforce best practices, e.g. in the form of standardized annotation protocols, that can easily be integrated into the surgical workflow. Once these are established, adherence to best practices could be increased by journal editors explicitly requesting annotation protocols to be submitted along with a respective paper that is based on annotated data. Journals could also allow for the explicit publication of annotation protocols in analogy to study protocols. Finally, platforms that enable spatial as well as temporal annotation in a collaborative manner and share common annotation standards and protocols as well as ML-based methods to facilitate automatic annotations are crucial. One means is to adapt already existing annotation platforms (see Table C.1) to fit the specific needs of SDS. Funding agencies should explicitly support efforts to make progress in this regard. Overall, a particularly promising approach to generating progress with respect to annotation standards is to start from the respective societies, such as SAGES. Alternatively or additionally, international working groups, similar to the one developing the DICOM standard, should be established. Such working groups should collaborate with existing initiatives, such as DICOM or HL7.
In the end, standards will only be successful if enough resources are invested into the actual data annotation. In this case various non-monetary incentives should be considered, including gamification and the issuing of certificates (e.g. for Certified Professional for Medical Data Annotation in analogy to Certified Professional for Medical Software).
How to incentivize and facilitate data sharing across institutions? (goals 2.7-2.9)-Data anonymization is a key enabler for sharing medical data and advancing the SDS field. By definition, anonymized data cannot be traced back to the individual and in both the USA and EU, anonymized data are not considered personal data, rendering them out of the scope of privacy regulation such as the GDPR. However, achieving truly anonymized data is usually difficult, especially when multiple data sources from an individual are linked in one data set. Removing identifiable metadata such as sensitive DICOM fields linking the patient to the medical image is necessary but not always sufficient for anonymization. For example, removing DICOM fields in a magnetic resonance imaging (MRI) scan of a patient's head is not sufficient because the individual may be identified from the image data through facial recognition (Schwarz et al., 2019). Full anonymization also exhibits the drawback of it being difficult to identify potential existing biases in data sets. Pseudonymization is a weaker form of anonymization where data cannot be attributed to an individual unless it is linked with other data held separately (European Parliament and Council of European Union, 2016). This is often easier to achieve compared to true anonymization, however, pseudonymized data are still defined as personal data, and as such remain within the scope of the GDPR. The public data sets used in SDS research such as endoscopic videos recorded within the patient's body are generally assumed to be anonymized but clear definitions and regulatory guidance are needed. Recent advances in federated learning could reduce security and privacy concerns since they rely on sharing machine learning models rather than the data itself (Kaissis et al., 2020) (see Section 3). A complementary strategy for bypassing current hurdles related to data sharing is data donation. Medical Data Donors e.V., for example, is a registered German non-profit organization, designed to build a large annotated image database which will serve as a basis for medical research. It can be supported by the public via donation of medical imaging data or by shopping at Amazon Smile. In the broader context of data donation, the SDS initiative discussed the concept of a data donor card in analogy to the existing organ donor card. With such a card, patients could explicitly state which kind of data they are willing to share with whom and under which circumstances. Overall, making progress on large public databases will require establishing an interlocking set of standards, technical methods, and data analysis tools tied to metrics to support reproducible SDS (Nichols et al., 2017) and provide value for the community.
Clinical registries provide a good example of such a mechanism. In a registry, a specific area of practice agrees on data to be shared, outcome measures to be assessed, and standardized formats as well as quality measures for the data (Arts et al., 2002). Identifying areas of SDS where the value proposition exists to drive the use of registries would provide muchneeded impetus to create data archives. So would creating more monetary and non-monetary incentives for institutions, clinical staff and patients to share and annotate data, although particularly the issue of incentivizing patients to share data presents an ethical gray area.

Data analytics
Data analytics (addressing the interpretation task in Fig. 1) is often regarded as the core of any SDS system. The perioperative data is processed to derive information addressing a specific clinical need, where applications may range from prevention and training to interventional diagnosis, treatment assistance and follow-up (Maier-Hein et al., 2017).

Current practice
Surgical practice has traditionally been based on observational learning, and decision making before, during and after surgical procedures highly depends on the domain knowledge and past experiences of the surgical team . SDS has the potential to initiate a paradigm shift with a data-driven approach (Hager et al., 2020;Vercauteren et al., 2020). Bishop and others classify data analytics tools as descriptive, diagnostic, predictive, and prescriptive (Bishop, 2006;Tukey, 1977): Descriptive analytics tools -what happened?-Descriptive analytics primarily provide a global, comprehensive summary of data made available through data communication such as simple reporting features. Syus' Periop Insight (Syus, Inc., Nashville, TN, USA) is an example of how descriptive analytics are used to access data, view key performance metrics, and support operational decisions through documentation and easy interpretation of historical data on supply costs, delays, idle time etc., relating overall operating room efficiency and utilization. Business Intelligence (BI) (Chen et al., 2012) tools are a typical form of descriptive analysis tools which comprise an integrated set of IT tools to transform data into information and then into knowledge, and have been used in healthcare settings (Ward et al., 2014) (e.g. Sisense ™ (Sisense Ltd., New York City, NY, USA), Domo ™ (Domo, Inc., American Fork, UT, USA), MicroStrategy ™ (MicroStrategy Inc., Tysons Corner, VA, USA), Looker ™ (Looker Data Sciences Inc., Santa Cruz, CA, USA), Microsoft Power BI ™ (Microsoft Corporation, Redmond, WA, USA) and Tableau ™ (Tableau Software Inc., Seattle, WA, USA)). These tools often incorporate features such as interactive dash-boards (Upton, 2019) that provide customized graphical displays of key metrics, historical trends, and reference benchmarks and can be used to assist in tasks such as surgical planning, personalized treatment, and postoperative data analysis.
Diagnostic analytics tools -why did it happen?-Diagnostic analytics tools, on the other hand, explore the data, address the correlations and dependencies between variables, and focus on interpreting the factors that contributed to a certain outcome through data discovery and data mining. These tools can facilitate the understanding of complex processes and reveal relationships between variables, or find root causes. For example, clinicians can use data on postoperative care to assess the effectiveness of a treatment (Bowyer and Royse, 2016;Kehlet and Wilmore, 2008).
Predictive and prescriptive analytics tools -What will happen? How can we make it happen?-Predictive analytics uses historical data, performs an in-depth analysis of historical key trends underlying patterns and correlations, and uses the insights gained to make predictions about what will likely happen next (What will happen?). Prescriptive analytics complement predictive analytics by offering insights into what actions can be taken to achieve target outcomes (How can we make it happen?). ML can meet these needs, but the challenges specific to surgery are manifold, as detailed in Maier-Hein et al. (2017). Importantly, the preoperative, intraoperative and postoperative data processed are potentially highly heterogeneous, consisting of 2D/3D/4D imaging data (e.g. diagnostic imaging data), video data (e.g. from medical devices or room cameras), time series data (e.g. from medical devices or microphones), and more (e.g. laboratory results, patient history, genome information). Furthermore, while the diagnostic process follows a rather regular flow of data acquisition, the surgical process varies significantly and is highly specific to patient and procedure. Finally, team dynamics play a crucial role. In fact, several studies have demonstrated a correlation between nontechnical skills, such as team communication, and technical errors during surgery (Hull et al., 2012). While first steps have been taken to apply ML in open research problems with applications ranging from decision support (e.g. determining surgical resectability (Marcus et al., 2020)) to data fusion for enhanced surgical vision (e.g. Akladios et al. (2020) 2020)), the vast majority of research has not yet made it to clinical trial stages. Section 5.4 highlights several challenges that need to be addressed in order to effectively adopt ML as an integral part of surgical routine.

Key initiatives and achievements
This section reviews some key initiatives and achievements from both an industrial and an academic perspective.  (Strickland, 2019). Despite its limitations, Watson Health has shown to be efficient in certain, narrow and controlled applications. For example, Watson for Genomics is used by genetics labs that generate reports for practicing oncologists. Given the information on a patient's genetic mutations, it can generate a report that describes all relevant drugs and clinical trials (Strickland, 2019). Other companies, societies and initiatives, such as Google (Mountain View, CA, USA) DeepMind Health (Graves et al., 2016;Tomašev et al., 2019), Intel (Santa Clara, CA, USA) (Healthcare IT News, 2012) and the American Society of Clinical Oncology (ASCO) CancerLinQ® (Sledge et al., 2013) have also been focusing on clinical data, and industrial success stories in surgery at scale are still lacking, as detailed in Section 6.
Academic initiatives: In academia, interdisciplinary collaborative large-scale research projects have developed data analytics tools to address different aspects of SDS. The Transregional Collaborative Research Center "Cognition Guided Surgery" focused on the development of a technical-cognitive assistance system for surgeons that explores new methods for knowledge-based decision support for surgery  as well as intraoperative assistance (Katić et al., 2016b). First steps toward the operating room of the future have recently been taken, focusing on different aspects like advanced imaging and robotics, multidimensional data modelling, acquisition and interpretation, as well as novel human-machine interfaces for a wide range of surgical and interventional applications Broadly speaking, much of the academic work in SDS is currently focusing on the application of ML methods in various contexts (Navarrete-Welton and Hashimoto, 2020; Zhou et al., 2019b;Alapatt et al., 2020), but clinical impact remains to be demonstrated (see Section 6).

Standards, platforms and tools
A broad range of software tools are used by the SDS community each day, reflecting the interdisciplinary nature of the field. Depending on the SDS application, tools may be required from the following technical disciplines that intersect with SDS: classical statistics, general ML, deep learning, data visualization, medical image processing, registration and visualization, computer vision, matural language processing (NLP), signal processing, surgery simulation, surgery navigation and augmented reality (AR), robotics, BI and software engineering. Many established and emerging software tools exist within each discipline and a comprehensive list would be vast and continually growing. In Table B.3, we have listed software tools that are commonly used by SDS practitioners today, organized by the technical disciplines mentioned above. In this section, we focus on ML frameworks and the regulatory aspects of software development for SDS.

ML frameworks and model standards:
ML is today one of the central themes of SDS analytics, and many frameworks are used by the SDS community. The scikit-learn library in Python is the most widely used framework for ML-based classification, regression and clustering using non-DL models such as Support Vector Machines (SVMs), decision trees and multi-layer perceptron (MLPs). DL, the sub-field of ML that uses Artificial Neural Networks (ANNs) with many hidden layers, has exploded over the past 5 years, also due to the mature DL frameworks  (Nguyen et al., 2019). Other useful tools include training progress visualization with Tensorboard, and AutoML systems for efficient automatic hyperparameter and model architecture search, such as Hae2O, auto-sklearn, AutoKeras and Google Cloud AutoML. NVIDIA DIGITS takes framework abstraction a step further with a web application to train DL models for image classification, segmentation and object detection, and a graphical user interface (GUI) suitable for non-programmers. Such tools are relevant in SDS where clinical researchers can increasingly train standard DL models without any programming or ML experience (Faes et al., 2019). On the one hand this is beneficial for technology democratization, but on the other hand it elevates known risks of treating ML and DL systems as "black boxes" (PHG Foundation, 2020). Recently NVIDIA has released NVIDIA Clara, a software infrastructure to develop DL models specifically for healthcare applications with large-scale collaboration and federated learning.
Each major framework has its own format for representing and storing ML models and associated computation graphs. There are now efforts to standardize formats to improve interoperability, model sharing, and to reduce framework lock-in. Examples include the Neural Network Exchange Format (NNEF), developed by the Khronos Group with participation from over 30 industrial partners, Open Neural Network Exchange (ONNX) and Apple's (Cupertino, CA, USA) Core ML for sharing models, and for sharing source code to train and test these models. GitHub is undeniably the most important sharing platform, used extensively by SDS practitioners, which greatly helps to promote research code reusability and reproducibility. "Model Zoos" (e.g. Model Zoo, ONNX Model Zoo) are also essential online tools to allow easy discovery and curation of many of the landmark models from research literature.

Regulatory software standards:
The usual research and development pipeline for an SDS software involves software developed at various stages including data collection and curation, model training, model testing, application deployment, distribution, monitoring, model improvement, and finally a medically approved product. For the classification as a medical product, the intended purpose by the manufacturer is more decisive than the functions of the software. Software is a "medical device software" (or "software as a medical device" (SaMD)) if "intended to be used, alone or in combination, for a purpose as specified in the definition of a medical device in the medical devices regulation or in vitro diagnostic medical devices regulation" (MDCG 2019-11), i.e. if intended to diagnose, treat or monitor diseases and injuries. The manufacturer of an SDS software application as SaMD needs to ensure that the safety of the product is systematically guaranteed, prove that they have sufficient competencies to ensure the relevant safety and performance of the product according to the state of the art (and keep evidence for development, risk management, data management, verification and validation, postmarket surveillance and vigilance, service, installation, decommissioning, customer communication, monitoring applicable new or revised regulatory requirements).
Yet, ML-based software requires particular considerations (Gerke et al., 2020). For example, the fact that models can be improved over time with more training data (often called the "virtuous cycle") is not well handled by these established standards. In 2019, the FDA published a "Proposed Regulatory Framework for Modifications to Artificial Intelligence/ Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD)", specifically aimed to clarify this subject (FDA, 2019). In contrast to the previously "locked" algorithms and models, this framework formulates requirements on using Continuous Learning Systems (CLS) and de fines a premarket submission to the FDA when the AI/ML software modification significantly affects device performance, or safety and effectiveness; the modification is to the device's intended use; or the modification introduces a major change to the SaMD algorithm. The implementation of these requirements, especially with regard to the actual product development, is an unsolved problem.

Current challenges and next steps
The data analytics-related mission as well as corresponding goals generated by the consortium are provided in Table 4. This section elaborates on the most important research questions from a ML methodological perspective: How to ensure robustness and generalization? (goal 3.1)-Models trained on the data from one clinical site may not necessarily generalize well to others due to variability in devices, individual practices of the surgical team or the patient demographic. While data augmentation (Itzkovich et al., 2019) can address this issue to some extent, an alternative promising approach is to develop architectures designed to generalize across domains. Early approaches focused on domain adaptation (Heimann et al., 2013;Wirkert et al., 2017) or more generically transfer learning (Pan and Yang, 2010) to compensate for domain shifts in the data. Other attempts have focused on converting data into a domain-invariant representation and on decoupling generic task-relevant features from domain-specific ones (Dai et al., 2017;Mitchell, 2019;Sabour et al., 2017;Sarikaya and Jannin, 2020). Generally speaking, however, ML methods trained in a specific setting (e.g. hospital) still tend to fail to generalize to new settings.

How to improve transparency and explainability? (goal 3.2)-The WHO document on Ethics & Governance of Artificial Intelligence for Health (WHO, 2021)
(see Section 3) states that "AI technologies should be intelligible […] to developers, medical professionals, patients, users and regulators" and that "two broad approaches to intelligibility are to improve the transparency of AI technology and to make AI technology explainable". In this con text, transparency also relates to the requirement that "sufficient information be published or documented before the design or deployment of an AI technology and that such information facilitate meaningful public consultation and debate on how the technology is designed and how it should or should not be used". Explainability stems from the urge to understand why an algorithm produced a certain output. In fact, the complexity of neural network architectures with typically millions of parameters poses a difficulty for humans to understand how these models reach their conclusions (Reyes et al., 2020). As a result, the EU's GDPR, implemented in 2018, also discourages the use of black-box approaches, thus providing explicit motivation for the development of models that provide human-interpretable information on how conclusions were reached. Interpretable models are still in their infancies and are primarily studied by the ML community (Adebayo et al., 2018;Bach et al., 2015;Koh and Liang, 2017;Shrikumar et al., 2017). These advances are being adopted within medical imaging communities in applications that are used to make a diagnosis (e.g. detecting/segmenting cancerous tissue, lesions on MRI data) (Gallego-Ortiz and Martel, 2016), and to generate reports that are on a par with human radiologists (Gale et al., 2018), for example. Open research questions are related to how to validate the explanation of the models (lack of ground truth) and how to best communicate the results to non-experts. A concept related to explainability is causality.
To date, it is generally unknown how a given intervention or change is likely to affect outcome, which is influenced by many factors even beyond the surgeon and the patient. Furthermore, randomized controlled trials (RCTs) to evaluate surgical interventions are difficult to perform (McCulloch et al., 2002). Thus, it is hard to provide the same quality of evidence and understanding of surgery as, for example, for a drug treating a common non-life-threatening condition (Hager et al., 2020). While large-scale data may help reveal relationships among many factors in surgery, correlation does not equal causation. Recent work on causal analysis (Peters et al., 2017;Schölkopf, 2019;Castro et al., 2020), however, may help in this regard.
How to address data sparsity? (goal 3.3)-One of the most crucial problems in SDS is the data sparsity (see Section 2), which is strongly linked to the lack of robustness and generalization capabilities of algorithms. Several complementary approaches have been proposed to address this bottleneck. These include crowdsourcing 2015;Malpani et al., 2015;Heim et al., 2018;Albarqouni et al., 2016;Maier-Hein et al., 2016) and synthetic data generation (Pfeiffer et al., 2019;Ravasio et al., 2020;Wirkert et al., 2017;Rivoir et al., 2021) briefly mentioned above. Unlabeled data can also be exploited by using self-supervised (see e.g. ) and semi-supervised learning (see e.g. (Yu et al., 2019;Srivastav et al., 2020)). Self-supervised methods solve an alternate, pretext or auxiliary task, the result of which is a model or representation that can be used in the solution of the original problem. Semi-supervised methods can exploit the unlabeled data in many different ways. In (Yu et al., 2019;Srivastav et al., 2020), for example, pseudo-annotations are generated on the unlabeled data using a teacher model, and the resulting pseudo-annotated dataset is then used to train another (student) model. Recent studies have further shown that exploiting the relationship across different tasks with the concept of multi-task learning  may be used to address data sparsity as well. It has been demonstrated to be beneficial to jointly reason across multi-tasks (Kokkinos, 2017;Long et al., 2017;Yao et al., 2012;Sarikaya et al., 2018) and take advantage of a combination of shared and task-specific representations (Misra et al., 2016). However, the performance of some tasks may also worsen through such a paradigm (Kokkinos, 2017). A possible solution to this problem might lie in the approach of attentive single-tasking (Maninis et al., 2019). Finally, meta-learning (Vanschoren, 2018;Godau and Maier-Hein, 2021) and more generally lifelong learning (Parisi et al., 2019) are further potential paradigms for addressing the problem of data sparsity in the future. Progress in this field will, at any rate, crucially depend on the availability of more public multi-task data sets, such as described by Maier-Hein et al. (2021).

How to detect, represent and compensate for uncertainties and biases? (goal 3.4)-A common criticism of ML-based solutions is the way that they handle "anomalies".
If a measurement is out-of-distribution (ood; i.e. it does not resemble the training data), the algorithm cannot make a meaningful inference, and the probability of failure (error) is high. This type of epistemic uncertainty (Kendall and Gal, 2017) is particularly crucial in medicine as not all anomalies/pathologies can be known beforehand. As a result, current work is dedicated to this challenge of anomaly/novelty/ood detection (Adler et al., 2019). Even if a sample is in the support of the training distribution, a problem may not be uniquely solvable (Ardizzone et al., 2018) or the solution may be associated with high uncertainty. Further research has therefore been directed at estimating and representing the certainty of AI algorithms (Adler et al., 2019;Nölke et al., 2021). Future work should focus on making use of the uncertainty estimates in clinical applications and increasing the reliability of ood methods, as well as systematically understanding and addressing the issue of biases and confounders (see Section 4.4). In this context the increased involvement of statisticians and experts from clinical epidemiology, such as in the biomedical image analysis initiative Roß et al., 2021a), would be desirable. Adopting the necessity of reporting data biases and confounders in publications should be a natural progression for the field of SDS.
How to address data heterogeneity and complexity? (goal 3.5)-The surgeons and surgical team dynamics play a significant role in intraoperative care. While the main surgeon has the lead and makes decisions based on domain knowledge, experience and skills, anesthesiologists, assistant surgeons, nurses and further staff play crucial roles at different steps of the workflow. Their smooth, dynamic collaboration and coordination is a crucial factor for the success of the overall process. Data analytics can play a key role in quantifying these intangibles by modeling workflows and processes. Surgeon skill evaluation, personalized and timely feedback during surgical training, optimal surgeon and patient/case or surgeon and surgical team matches are among the issues that can benefit from data analytics tools. Furthermore, data collected from multiple sources such as vital signs from live monitoring devices, electronic health records, patient demographics, or preoperative imaging modalities require analysis approaches that can accommodate their heterogeneity. Recent approaches in fusion of heterogeneous information include the use of specialized frameworks such as iFusion . Other work has specifically focused on handling incomplete heterogeneous data with Variational Autoencoders (VAEs) (Nazábal et al., 2020). Graph neural networks (Zhou et al., 2019a) appear to be another particularly promising research direction in this regard. Here as well, however, the lack of large amounts of annotated data is a limiting factor (Raghu et al., 2019). Heterogeneity may also occur in labels (Joskowicz et al., 2019). This could potentially be addressed with fuzzy output/references as well as with probabilistic methods capable of representing multiple plausible solutions in the output, as suggested by some early work on the topic (Kohl et al., 2018;Adler et al., 2019;Trofimova et al., 2020).

How to enable real-time assistance? (goal 3.6)-Fast inference in an interventional
setting relies on (1) an adequate hardware and communication infrastructure (covered in Section 3) and on (2) fast algorithms. The trade-off between algorithm and software optimization should be finely balanced between the available edge compute power and the latency requirements of the specific application. Moving high resolution video between devices or displays inherently adds delays and should be minimized for dynamic assistance applications or whether data inference links to control systems. This means that edge compute solutions should carefully consider the input to the display pipeline and the size of the inference models that can be loaded into an edge processor. Where latency is less critical, cloud execution of AI models has already been shown to be viable in assistive systems (e.g. Cydar EV from Cydar Medical (Cambridge, UK) for endovascular navigation, or CADDIE / CADDU from Odin Vision Ltd (London, UK) for AI assisted endoscopy). Cloud computing for real-time assistance relies on good connectivity to move data but offers the possibility of running potentially large inference models and returning results for assistance to the OR. Recent advances in the emerging research field of Tactile Internet with Human-in-the-Loop (TaHiL) (Fitzek et al., 2021), which involves intelligent telecommunication networks and secure computing infrastructure is an enabling technology for real-time remote SDS application. To trigger progress in the field, specific clinical applications requiring real-time support should be identified and focused on. Dedicated benchmarking competitions in the context of these applications could further guide methodological development.

How to train and apply algorithms under regulatory constraints? (goal 3.7)-
When an SDS data set contains personal medical data, an open challenge lies in how to perform data analytics and train ML models without sensitive information being exposed in the results or models. A general solution that is gaining increasing traction in ML is differential privacy (Dwork et al., 2006). This offers a strong protection mechanism against linkage, de-anonymization and data reconstruction attacks, with rigorous privacy guarantees from cryptography theory. A limitation of differential privacy can be seen in the resulting compromise in terms of model accuracy, which may conflict with accuracy targets.
Differential privacy may ultimately be mandatory for federated learning  and publicly releasing SDS models built from personal medical data. Since patients have the right to delete their data, privacy questions also arise regarding models that were trained on their data. In addition, it might be an attractive business model for companies to sell their annotated data or make them publicly available for research purposes. This requires methods to detect whether specific data has been used to train models, e.g. using concepts of "radioactive data" (Sablayrolles et al., 2020), or methods that detect whether a model has forgotten specific data (Liu and Tsaftaris, 2020). A complementary approach to preserving privacy is to work with a different representation of the data.

How to ensure meaningful validation and evaluation? (all goals)-Validation
-defined as the demonstration that a system does what it has been designed to do as well as evaluation -defined as the demonstration of the short-, mid-and long-term added values of the system -are crucial for the development of SDS solutions. The problem with the assessment of ML methods today is that models trained on a particular data set are evaluated on new data taken from the same distribution as the training data. Although recent efforts have been made in healthcare (McKinney et al., 2020) to include test data from different clinical sites, these still remain limited. This situation poses a challenge particularly for healthcare applications, as real-world test data, after the model is deployed for clinical use, will typically not have ground-truth annotation, making its assessment difficult (Castro et al., 2020). A recent example of this is Google Health's deep learning system that predicts whether a person might be at risk for diabetic retinopathy. In this case, after its deployment at clinics in rural Thailand, despite having high theoretical accuracy, the tool was reported to be impractical in real-world testing (TechCrunch, 2020). In the future, evaluation of methods should be performed increasingly in multi-center settings and incorporate the important aspects of robustness to domain shifts, data imbalance and bias. Global initiatives such as MLCommons and its Medical Working Group will play a central role in designing benchmarks and propose best practices in this regard. Furthermore, matching performance metrics to the clinical goals should be more carefully considered, as illustrated in recent work . Finally, specific technical aspects (e.g. explainability, generalization) should be comparatively benchmarked with international challenges and covered at dedicated workshops. In this context, acquiring dedicated sponsor money for annotations could help generate more high-quality public data sets.

Clinical translation
The process of clinical translation from bench to bedside has been described as a valley of death, not only for surgical (software) products, but biomedical research in general (Butler, 2008). In this section, we will begin by describing current practice and key initiatives in clinical translation of SDS. We elaborate on the concept of "low-hanging fruit" that may be reached in a comparatively straightforward manner through collaboration of surgeon scientists, computer scientists and industry leaders. Finally, we will outline current challenges and next steps for those low-hanging fruit to cross the valley of death, rendering SDS applications from optional translational research projects to key elements of the product portfolio for modern OR vendors, which in turn will increase engagement on the part of researchers, industry, funding agencies and regulatory bodies alike.

Current practice
Clinical translation of products developed through SDS is regulated under existing rules and guidelines. Ultimately, systems or products using SDS components must be able to provide value before, during or after surgery or interventions. Validating such capabilities requires prospective clinical trials in real treatment practices, which require ethics and safety approval by relevant bodies as well as adherence to software standards described in Section 5.4. System documentation and reliability is critical to pass through such approval procedures, which can however also exceptionally be obtained for research purposes without proof of code stability.
From a clinical research perspective, meta-analyses of RCTs are considered the gold standard. However, the field of surgery exhibits a notable lack of high-quality clinical studies as compared to other medical disciplines (McCulloch et al., 2002). While long-term clinical studies are a common prerequisite for clinical translation, despite intense research, the number of existing clinical studies in AI-based medicine is extremely low (Nagendran et al., 2020). As a result, most current clinical studies in the field are based on selected data that are retrospectively analyzed, leading to a lack of high quality evidence that in turn hampers clinical progress. A recent scoping review on AI-based intraoperative decision support in particular named the small size, single-center provenance and questionable representability of the data sets, the lack of accounting for variability among human comparators, the lack of quantitative error analysis, and a failure to segregate training and test data sets as the prevalent methodological shortcomings (Navarrete-Welton and Hashimoto, 2020).
Despite these shortcomings, it should be noted that not all questions that arise in the process of clinical translation of an algorithm necessarily need to be addressed by RCTs. For example, a recent DL algorithm to diagnose diabetic retinopathy was approved by the FDA based on a pivotal cross-sectional study (Abràmoff et al., 2018). Translational research on SDS products for prognosis also leverages existing methodology on prospective and retrospective cohort studies for the purposes of internal and external validation.
Generally speaking, the field of SDS still faces several domain-specific impediments. For instance, digitalization has not percolated the OR and the surgical community in the same way as other areas of medicine . A lack of standardization of surgical procedures hampers the creation of standardized annotation protocols, an important prerequisite for large-scale multi-center studies. Pioneering clinical success stories are important motivators to help set in motion a virtuous circle of advancement in the OR and beyond.

Key initiatives and achievements
The following section will provide an overview of existing SDS products and clinical studies in SDS.

SDS products:
Over the past few years, modest success in clinical translation and approval of SDS products has been achieved, as summarized in Table 5. This predominantly includes decision support in endoscopic imaging. Endoscopic AI (AI Medical Service, Tokyo, Japan) and GI Genius ™ (Medtronic, Dublin, Ireland) support gastroenterologists in the detection of cancerous lesions, the former albeit struggling with a low positive predictive value (Hirasawa et al., 2018). Other successful applications include OR safety algorithms or computer vision-based data extraction.
Translational progress in academia: While most of the work has focused on preoperative decision support, here, we place a particular focus on intraoperative assistance. Table 6 shows several exemplary studies in academia that illustrate how far SDS products have been translated to clinical practice in this regard.
Intraoperative assistance: A recent review on AI for surgery mainly found studies that use ML to improve intraoperative imaging such as hyperspectral imaging or optical coherence tomography (Navarrete-Welton and Hashimoto, 2020). Further notable intraoperative decision support efforts have focused on hypoxemia prevention (Lundberg et al., 2018), sensor monitoring to support anesthesiologists with proper blood pressure management (Wijnberge et al., 2020) and intelligent spinal cord monitoring during spinal surgery (Fan et al., 2016). A number of models have been developed to promote safety in laparoscopic cholecystectomy, a very common and standardized minimally invasive abdominal procedure.
For instance, a model for bounding box detection of hepatocystic anatomy was recently tested in the operating room (Tokuyasu et al., 2021). Another example of SDS for safe cholecystectomy is DeepCVS, a neural network trained to semantically segment hepatocystic anatomy and assess the criteria defining the CVS (Mascagni et al., 2020b). A recent study based on 290 laparoscopic cholecystectomy videos from 37 countries showed that DL-based image analysis may be able to identify safe and dangerous zones of dissection . Finally, a cross-sectional study using DL algorithms developed on videos of the surgical field from more than 1000 cholecystectomy procedures from two institutions showed an association between disease severity and surgeons' ability to verify the CVS (Korndorffer et al., 2020). Another example of intraoperative decision support is a study by Harangi et al. (2017), who developed a neural network-based method to classify a structure specified by a surgeon (by drawing a line in the image) into the uterine artery or ureter. The authors reported a high accuracy, but the study was a cross-sectional design with a convenience sample. In fact, convenience samples are the norm in most existing studies in SDS addressing recognition of objects or anatomical structures in the surgical field. This sampling mechanism makes the findings susceptible to selection bias, which affects generalizability or external validation of the methods.

Perioperative decision support and prediction:
A selection of studies on perioperative assistance can be found in Appendix D. One important application of academic SDS is clinical decision support systems (CDSS) that integrate various information sources and compute a recommendation for surgeons about the optimal treatment option for a certain patient. Many of these CDSS are prediction systems that integrate into a mathematical model clinical, radiological and pathological attributes collected in a routine setting and weigh these parameters automatically to achieve a novel risk stratification (Shur et al., 2020). Trained with a specifically selected subpopulation of patients, these prediction systems may help improve current classification systems in guiding surgical decisions (Tsilimigras et al., 2020). Relevant information like overall-and recurrence-free survival (Schoenberg et al., 2020) or the likelihood of intra-and postoperative adverse events to occur (Bhandari et al., 2020) can be assessed and obtained quickly via online applications such as the pancreascalculator.com (van Roessel et al., 2020). In contrast to these scorebased prediction systems, ML-based systems are more flexible. The most prominent MLbased system, IBM's Watson for Oncology, is based on NLP and iterative features and demonstrated good accordance with treatments selected by a multidisciplinary tumor board in hospitals in India (Somashekhar et al., 2018) and South Korea . Weaknesses of this system include the necessity of skilled oncologists to operate the program, low generalizability to different regions, and the fact that not all subtypes of a specific cancer can be processed (Yao et al., 2020;Strickland, 2019).
Another important application besides decision support is prediction of adverse events. A widely discussed work showed that DL may predict kidney failure up to 48 hours in advance (Tomašev et al., 2019). In the intensive care unit (ICU), where surgeons face enormous quantities of clinical measurements from multiple sources, such as monitoring systems, laboratory values, diagnostic imaging and microbiology results, data-driven algorithms have demonstrated the ability to predict circulatory failure (Hyland et al., 2020).

Low-hanging fruit
In light of the lack of a critical number of clinical success stories, a viable approach to clinical translation initially should focus on "low-hanging fruit". We believe the following criteria influence the likelihood of successful translation of an SDS application: high patient safety, technical feasibility -especially regarding data needs and performance requirements -easy workflow integration, high clinical value and high business value to encourage industry adoption. Low-hanging fruit typically also avoid being classified as a high-risk medical product, thereby reducing regulatory demands and development barriers. However, it is difficult to satisfy all of these often conflicting criteria simultaneously. For example, applications of significant clinical value such as real-time decision support are highly technically challenging. By contrast, low-level video processing applications such as uninformative frame detection are technically simple but of limited clinical value. SDS applications that are low-hanging fruit are ones that offer a good balance between most or all of these criteria.
An example for a low-risk medical device in the broader scope of SDS is the aforementioned GI Genius that uses AI for real-time detection and localization of polyps during colonoscopy, supporting the examination but not replacing the clinical decision making and diagnostics by clinicians. Considering the low risk to patients, GI Genius is classified as a Class II medical device (with special controls) by the FDA (FDA, 2021b).

Different types and opportunities:
In surgery, a framework that may help determine the next steps for low-hanging fruit is the digital technology framework that categorizes data-centric product innovations in descriptive, diagnostic, predictive and prescriptive, as detailed in Section 5. Currently, the overwhelming focus for SDS researchers is in the prescriptive technology area -for example on tools that provide surgical decision support or predict adverse events. Changing the development lens from prescriptive to descriptive SDS applications, however, may open up entirely new avenues. For instance, a low-hanging fruit may lie in a descriptive decision support tool that informs surgeons on how many surgeons performed certain steps within an intervention and the consequences. Such a data-centric SDS product would not require embedded surgical expertise in order to provide value to the surgeon, but only a database of surgical videos and automated recognition of anatomical structures and surgical instruments, which is technically feasible. In essence, instead of the very difficult automation of surgical decisions, value can be found in providing surgeons and surgical teams with moment-to-moment risk stratification data to facilitate their decisions. An additional benefit of this approach is that it can be combined with real-time data acquisition regarding how surgeons interact with the risk stratification data, which would greatly facilitate the development of both predictive and prescriptive decision support tools.
Importantly, presenting statistical data and evidence-based risk stratification information to the surgeon would also have a different regulatory path than a prescriptive SDS product that offers surgical decisions based on an AI database grounded in surgical decision making. The data-focused product leaves the surgeon fully responsible, while the decision based product makes it questionable who is fully responsible if the surgeon followed an AI-based decision and there was a poor outcome. Another benefit of focusing on descriptive technologies is there is a much smaller technology adoption hurdle for the surgeon when faced with trusting descriptive statistics compared to an AI-based prescriptive decision support tool.
An ML-based descriptive low-hanging fruit could be data-driven surgical reporting and documentation. Surgical procedures are currently documented as one to two pages of text. While a six to eight hour video will not serve as a report in itself, SDS may help extract relevant information from this video by automatically documenting important steps in the procedure. Here, computer vision algorithms for recognition of surgical phases and instruments may be used to extract metainformation from videos (Mascagni et al., 2021b).
An ML-based predictive low-hanging fruit could lie in the optimization of OR logistics. Prediction of procedure time either preoperatively or utilizing intraoperative sensor data may not improve patient outcome, but could provide value to hospital managers if it helps cut down costs in the OR by optimizing patient volume (Aksamentov et al., 2017;Bodenstedt et al., 2019b;Twinanda et al., 2019). This, too, harbors low risk for patients and has a low barrier for market entry. Furthermore, the reference information, i.e., time between incision and suture, is already documented in most hospitals and no laborious annotation by surgical experts is necessary to train the respective ML algorithms. Since OR management tools already exist, SDS applications could even yield success stories within existing tools without having to establish entirely new software tools. Improvements in patient safety may already result from a simple tool that combines SDS algorithms for object recognition in laparoscopic video (e.g. gauze, specimen bag or suture needle) with a warning for surgeons and scrub nurses if these objects are introduced into the patient's abdomen but not removed afterwards. Since such an SDS application warns clinical staff but does not perform an action on the patient itself, the risk for the patient is inherently low. Here, a combination of surgical knowledge (which objects are at what time introduced into the patient's body?) with SDS algorithms (which objects can robustly be detected?) and an unobtrusive user interface with a low false alarm rate may result in a low-hanging fruit. Along these lines, automation of the surgical checklist (Conley et al., 2011) would be a technically feasible SDS application with high clinical value.

Surgical robotics as catalyst:
The impending success of next-generation surgical robotics in the OR may bring further opportunities to the clinical translation of SDS. The da Vinci® surgical system (Intuitive Surgical Inc., Sunnyvale, CA, USA) and its upcoming competitors lay the foundation for systematic data capture as well as surgical guidance by information augmentation in the OR. A relatively low-hanging fruit with benefit to the surgeon in the domain of surgical robotics may be an automated camera guidance system, as suggested by Wagner et al. . On the one hand, the risk of poor camera positioning for the patient is low compared to that of invasive tasks such as suturing. On the other hand, correcting the camera position is currently a highly disruptive task to the surgeon. The first products for autonomous endoscopic camera control are now emerging in robotic surgery, such as the FDA-approved system from TransEnterix (Morrisville, NC, USA).

Current challenges and next steps
As highlighted in several previous publications 2018a;Hager et al., 2020), clinical applications for SDS are manifold, ranging from preand intraoperative decision support to context-aware assistance and surgical skills training. The clinical translation-related goals generated by the consortium as part of the Delphi process are provided in Table 7. The following aspects deserve particular attention: How to catalyze clinical translation of SDS? (goals 4.1/4.2)-Clinical data is recognized as "the resource most central to health-care progress" (Institute of Medicine (USA) Roundtable on Value & Science-Driven Health Care, 2010). What is needed is thus a cultural shift toward data acquisition, annotation and analysis within a well-defined data governance framework as a primary clinical task (August et al., 2021). The allocation of economic, infrastructural and personnel resources within hospitals for this appears as a non-negotiable requirement for the purpose. The need for creating value from large amounts of representative data, both for de novo development/validation and external validation studies, further necessitates multi-institutional collaborations. Researchers in other domains have achieved such collaborations, for example in genomics and bioinformatics; SDS would benefit from adopting relevant aspects of these domains' research cultures. In addition, enabling explicit academic recognition for developing rigorously annotated data sets can facilitate data resources for research in SDS, as discussed in Section 4. Paving the way for short-term clinical success stories as well as long-term clinical translation further requires SDS applications to be integrated into clinical workflows. In fact, the sparsity of studies on SDS solutions for intraoperative care illustrate the challenge of conducting multidisciplinary research while prioritizing the patient. Therefore, research on SDS products should consider the impact on workflow early in product development and closely engage relevant stakeholders (see Table 1). Impactful success stories could then be generated by focusing on low-hanging fruit presented in the previous section. These, in turn, would contribute to building public trust in SDS and boost public enthusiasm to spark patient demand.
How to improve knowledge transfer among different stakeholders? (goal 4.3)-The creation of interdisciplinary networks involving the different stakeholders and the regular organization of SDS events in conjunction with both technical and medical conferences is key to improving knowledge transfer between the groups. Such events should, in part, be dedicated to specific questions, such as annotation guidelines, data structures or good practices with respect to external validation. As a means for actively disseminating, discussing, and promoting new insights in the field of SDS, a well-curated community web platform should be established as the central information hub. One could even go further and offer e.g. a prize for clinical trials demonstrating SDS success. A good means for public outreach could be the hosting of public days focused on a particular topic at major conferences in the field, as a way of creating awareness for that topic, or campaigns e.g. in the vein of "Stop the Bleed" (ACS Committee on Trauma).

How to train key SDS personnel? (goal 4.4)-In order to facilitate clinical translation
of SDS in the long term, it will further be crucial to promote the transdisciplinary training of future surgical data scientists and thereby establish SDS as a career path. Computer scientists will have to enter ORs on a regular basis to understand real clinical problems and to get an impression of the obstacles in clinical translation. Similarly, surgeons will have to understand the basic principles, capabilities and limits of data science techniques to identify solvable clinical problems and proper applications for SDS. A viable path to improve knowledge transfer would be to establish SDS as a commonly respected career path in hospitals. In this context, both technical and clinical disciplines should be complemented by knowledge and expertise in clinical research methodology, i.e., epidemiology and biostatistics. Moreover, human factors engineering and human computer interaction researchers should be integrated into the community. Setting up such an SDS career path should also involve the definition of specifics and skills an 'AI-ready' clinician should meet. A curriculum should put a specific focus on medical statistics covering confounding variables, risk correction and data biases, as well as on regulatory issues (e.g. SaMD). On top of the research-oriented positions, we should further seek to establish SDS-related jobs for data acquisition, management and annotation, specifically in university hospitals.
How to ensure high-quality external validation of SDS applications? (goal 4.5-4.7)-A critical pitfall with clinical prediction models, which include models for diagnosis and prognosis, is unbridled proliferation of de novo development and validation studies, but scant external validation studies (Adibi et al., 2020). Research to support regulatory approval of SDS products, i.e., in order to market these products, would typically address external validation. However, advances in clinical care are not restricted to marketed products. Therefore, it is necessary for the research community to not only conduct de novo development and validation studies but also well designed external validation studies. Past experience with clinical prediction models shows the need for creative solutions. While some solutions, such as "living registries", have been proposed (Adibi et al., 2020), proactive effort by the SDS community to develop effective solutions that allow for consistent and uniform external validation can be a transformative contribution. The status quo, summarized in a review of existing literature in AI-based intraoperative decisionmaking, shows that the SDS community has not addressed the pitfall of inadequate external validation studies (Navarrete-Welton and Hashimoto, 2020). This challenge is systematically addressed when the end-goal for the translational research is regulatory approval to market a SDS product; the regulatory agency serves as a steward in this case. Similar stewardship may benefit translational research in SDS that is not intended to support regulatory approval. Finally, it is important to develop new performance metrics for AI algorithms that quantify clinically relevant parameters currently not accounted for in outcome validation studies. One particular challenge lies in the assessment of long-term outcomes. Many established metrics, such as 5-year-survival after a surgical intervention for cancer, may not be immediately available following surgery. Here, ML techniques can help by capturing data patterns that could serve as potential surrogate measures: Surgical video or motion data localized to anatomy through imaging studies may be used to identify activities or events that increase the risk of cancer cell seeding and subsequent metastasis and thus predict the long-term outcome.
How to ensure ethical and legal guidance? (goals 4.8/ 4.9)-With the face of datadriven clinical practice about to change in a vast manner, unprecedented ethical and legal questions pertaining to both the regulation of medical AI as well as its practical use will be raised. Moving forward, liability and medical negligence/insurance regulations need to be adapted for data-driven clinical practice. A recent survey among Dutch surgeons revealed privacy and liability concerns as significant grounds for objection to video and audio recording of surgical procedures (van de Graaf et al., 2020), reinforcing the importance of clear regulatory frameworks toward better clinical acceptance. New regulations will have to go much further than these current considerations, with a particular focus to be placed on cases of AI failure, human rejection of AI recommendations, or potentially the omission of AI (European Parliament, 2020). Notably, the FDA recently put forth an Artificial Intelligence and Machine Learning (AI/ML) Software as a Medical Device Action Plan (FDA, 2021a). These regulatory issues strongly interconnect with previously raised issues of trust in as well as transparency and explainability of AI models, which have also been raised in the very recent WHO report Ethics & Governance of Artificial Intelligence for Health (WHO, 2021). An ethical and human rights-based framework intended to guide the development and use of AI was further proposed by Fjeld et al. (2020), taking eight key themes such as privacy, accountability, safety/security, transparency/explainability, fairness and non-discrimination, human control of technology, professional responsibility, and promotion of human values into account. Moreover, ethical and moral considerations regarding the democratization of data and/or AI model access will be necessary. In the specific context of surgery, first guidance on the ethical implications of integrating AI algorithms into surgical training workflows has recently become available (Collins et al., 2021). Similarly, new concepts for obtaining patient consent to data sharing that take into account the dynamics and unforeseeability of data usage in future SDS applications need to be established. One way to go might be the introduction of a data donor card, analogously to organ donor cards, as suggested in Section 4.4. Both patient-and healthcare professional-centric ethical and legal considerations are likely to have a large impact on the public perception of and trust in SDS, which needs to be boosted for higher patient demand. Above all, patient safety must be supported by the development of contemplative regulatory frameworks.
In summary, a multi-pronged approach to address challenges that can catalyze rapid advances in SDS and to develop solutions to problems considered low-hanging fruit will be crucial to the future of SDS as a scientific field. The introduction of initial features that provide clear benefits can facilitate advanced changes. To this end, a compositional approach may be pursued wherein complex SDS products reuse simpler AI models that have been previously approved and adopted in clinical care. Once a number of high value applications are established and there is hospital buy-in, a virtuous circle of SDS can be expected to begin, enabling more applications, higher volume data collection, stronger models, streamlined regulation, and better acceptance.

Discussion
15 years have passed since the vision of the OR of the future was sketched for the year 2020 (Cleary et al., 2004). A central goal of the SDS 2019 workshop was to revisit the paper and report produced by Cleary et al. (2005) and Mun and Cleary (2005) and investigate where we stand, what has hindered us to achieve some of the goals envisioned and what are new trends that had not been considered at the time.
When asked: "What has really changed when you are entering the OR of today as compared to the setting in 2004?", participants came to the conclusion that they do not perceive any disruptive changes. Improvements were stated to be of rather incremental nature including advances in visualization (e.g. 3D visualization and 4K video imaging (Ceccarelli et al., 2018;Dunkin and Flowers, 2015;Rigante et al., 2017)) and improvements in tissue dissection, which is now safer, easier and faster to perform due to ultrasound scissors and impedance controlled electrosurgery, for example. None of these innovations includes a relevant AI or ML aspect. And some developments did not even come with the envisioned benefits. For instance, staplers of today are by far more sophisticated than 10 years ago, but the problem of anastomotic leakage is still relevant (Stamos and Brady, 2018). The following paragraphs put the main (six) topics of the 2004 workshop into today's perspective.

Operational efficiency and workflow:
Core problems identified in 2004 were the "absence of a standard, computerized medical record for patients that documents their histories and their needs" as well as "multiple and disparate systems for tracking related work processes". While these problems have remained until today (see Section 3), the challenge of integrating the different information sources related to the entire patient pathway has meanwhile been widely acknowledged. Emerging standards like HL7 FHIR and the maturing efforts of IHE form a solid base for future developments. However, standards alone are not sufficient to solve the problem; hospitals need to make data acquisition, exchange and accessibility a requirement. HIT that enables fast deployment of tools for data acquisition, annotation and processing should be seen as a core service to enable cutting edge research. By centralizing such efforts, data pools can be maintained over the scope of many projects instead of creating isolated databases. This brings with it the need to standardize regulatory workflows. Getting access to data for research is often highly challenging. By out-lining clear guidelines and codes of conduct, time spent on formalities can be cut while reducing uncertainties regarding what is the right or wrong way to handle sensitive data. Finally, the prevalence of unstructured data needs to be decreased in order to increase data accessibility. At this point, this also seems to be a matter of user interfaces -by providing clinicians with tools to rapidly create structured reports, reliance on free text can be reduced. This, however, requires training and acceptance by clinical personnel -which could be increased through education in data science topics.

Systems integration and technical standards:
OR integration was the aim of multiple international initiatives, such as OR.NET, the Smart Cyber Operating Theater (SCOT) project (Iseki et al., 2012) and the Medical Device "Plug-and-Play" (MD PnP) Interoperability Program. Despite these ongoing efforts we are, however, still far from an OR in which "all machines and imaging modalities can talk to each other", as postulated in 2004. Again, interoperability with intraoperative devices should be viewed as a prerequisite by clinical management, and as an investment in future workflow and cost optimization. Emerging standards like SDC provide a means to enable data exchange; however, more work needs to be invested in the creation of platforms that enable dynamic reactions to events and complex interactions.

Telecollaboration:
While the OR of the twenty-first century connects many different individuals from various disciplines, telecollaboration has only slightly evolved during the last one and a half decades, and a genuine breakthrough has not yet been achieved (Choi et al., 2018). Many of the impediments can be seen in missing technical developments (e.g. regarding data compression and latency), coordination issues and knowledge gaps on the part of the prospective users as well as the aforementioned lack of data standardization . It is to be hoped that coming improvements in intelligent telecommunication networks (e.g. 5G) might trigger future progress in telecollaboration.

Robotics and surgical instrumentation:
In 2020, numerous surgical procedures, including major surgery on the esophagus, pancreas or rectum, are feasible to be performed using surgical robots. In striking contrast, the actual use of surgical robotics is still marginal. A number of high-quality controlled trials failed to prove superiority, making the use of surgical robotics in many cases difficult to justify (Roh et al., 2018). Another reason for the poor progress may lie in the lack of competition in hardware. Since the discontinuation of the development of the ZEUS device in 2003, the field has been clearly dominated by the da Vinci system. Only in recent times, truly competitive systems such as the Senhance ™ (TransEnterix) or the Versius® (Cambridge Medical Robotics Ltd., Cambridge, UK) system have begun to emerge (Peters et al., 2018). It will be exciting to see whether a broader range of technical solutions, along with, perhaps, a stronger interlocking with next-generation intraoperative imaging, will stimulate this particular aspect of the next OR.

Intraoperative diagnosis and imaging:
While intraoperative imaging appeared very promising in 2004, the modest successes that have been made in that area are mostly related to mobile X-Ray based devices and drop-in devices in robotics (Diana et al., 2017;Goyal, 2018). The pivotal problem of matching pre-and intraoperative images still remains, as does the unsolved issue of adaptive real-time visualization during intraoperative deformation of soft tissue. One emerging and very promising field is the field of biophotonics (see Section 3). Benefiting from a lack of ionizing radiation, low hardware complexity and easy integrability into the surgical workflow, biophotonics has yielded an increasing number of success stories in intraoperative imaging (Bruins et al., 2020;Neuschler et al., 2017).

Surgical informatics:
In 2004, the term SDS had not been invented. At that time, surgical informatics was defined as the collection, storage/organization, retrieval, sharing, and rendering of biomedical information that is relevant to the care of the surgical patient, with an aim to provide comprehensive support to the entire healthcare team . Since the beginnings of the field of computer-aided surgery, however, AI and in particular ML have arisen as new enabling techniques that were not in the focus 15 years ago. While these techniques have begun revolutionizing other areas of medicine, in particular radiology (Kickingereder et al., 2019;Shen et al., 2017), SDS still suffers from a notable absence of success stories. This can be attributed to a number of various challenges, specifically related to high quality and high volume data annotation, as well as intraoperative data acquisition and analysis and surgical workflow integration, as detailed in Section 3-6.
Overall, the comparison between the workshop topics discussed in 2004 and 2019 revealed that the most fundamental perceived difference is related to how the future of surgery is envisioned by experts in the field. While discussions in 2004 were mainly centered around devices, AI is now seen as a key enabling technique for the future OR. This article has therefore been centered around technical challenges related to applying AI/ML techniques to surgery. A core challenge now is to put the vision of SDS into clinical practice. The large number of relevant SDS stakeholders (Table 1) as well as the large number of goals with high priority (Table 2, 3, 4, 7), as compiled by the international Delphi expert panel, illustrate that the hurdles are high. With the presented concrete recommendations for addressing the complexity of SDS and moving forward, we hope to support the SDS community in overcoming existing barriers and eventually achieving clinical translation. List of publicly accessible and annotated surgical data repositories, assigned to the categories (1) robotic minimally-invasive surgery, (2) laparoscopic surgery, (3) endoscopy, (4) microscopic surgery, and (5) (Stoyanov et al., 2005;Lerotic et al., 2008;Mountney et al., 2010;Pratt et al., 2010;Stoyanov et al., 2010;Giannarou et al., 2013;Ye et al., 2017) diverse procedures, e.g. partial nephrectomy, totally endoscopic coronary artery bypass graft, intra-abdominal exploration        Building blocks of a surgical data science (SDS) system. Perception: Relevant data is perceived by the system (Section 3). In this context, effectors include humans and/or devices that manipulate the patient including surgeons, operating room (OR) team, anesthesia team, nurses and robots. Sensors are devices for perceiving patient-and procedure-related data such as images, vital signals and motion data from effectors. Data about the patient includes preoperative images and laboratory data, for example. Domain knowledge serves as the basis for data interpretation (Section 4). It comprises factual knowledge, such as previous findings from studies, clinical guidelines or hospital-specific standards related to the clinical workflow as well as practical knowledge from previous procedures. Interpretation: The perceived data is interpreted in a context-aware manner (Section 5) to provide real-time assistance (Section 6). Applications of SDS are manifold, ranging from surgical education to various clinical tasks, such as early detection, diagnosis, and therapy assistance.  Mission statement corresponding to technical infrastructure (Sec. 3) along with corresponding goals. The distribution of priorities (from left to right: not a priority, low priority, medium priority, high priority, essential priority) as rated by the participants of the Delphi process is depicted for each goal. Table 3 Mission statement corresponding to data annotation and sharing (Sec. 4) along with corresponding goals. The distribution of priorities (from left to right: not a priority, low priority, medium priority, high priority, essential priority) as rated by the participants of the Delphi process is depicted for each goal. Mission statement corresponding to data analytics (Sec. 5) along with corresponding goals. The distribution of priorities (from left to right: not a priority, low priority, medium priority, high priority, essential priority) as rated by the participants of the Delphi process is depicted for each goal.  Table 5 Selection of SDS products with machine learning (ML)-based components as of October 2020.  Mission statement corresponding to clinical translation (Sec. 6) along with corresponding goals. The distribution of priorities (from left to right: not a priority, low priority, medium priority, high priority, essential priority) as rated by the participants of the Delphi process is depicted for each goal.