A Scoping Review of Artificial Intelligence Research in Rhinology

Background A considerable volume of possible applications of artificial intelligence (AI) in the field of rhinology exists, and research in the area is rapidly evolving. Objective This scoping review aims to provide a brief overview of all current literature on AI in the field of rhinology. Further, it aims to highlight gaps in the literature for future rhinology researchers. Methods OVID MEDLINE (1946-2022) and EMBASE (1974-2022) were searched from January 1, 2017 until May 14, 2022 to identify all relevant articles. The Preferred Reporting Items for Systematic Reviews and Meta-analyses Extension for Scoping Reviews checklist was used to guide the review. Results A total of 2420 results were identified of which 62 met the eligibility criteria. A further 17 articles were included through bibliography searching, for a total of 79 articles on AI in rhinology. Each year resulted in an increase in the number of publications, from 3 articles published in 2017 to 31 articles published in 2021. Articles were produced by authors from 22 countries with a relative majority coming from the USA (19%), China (19%), and South Korea (13%). Articles were placed into 1 of 5 categories: phenotyping/endotyping (n = 12), radiological diagnostics (n = 42), prognostication (n = 10), non-radiological diagnostics (n = 7), surgical assessment/planning (n = 8). Diagnostic or prognostic utility of the AI algorithms were rated as excellent (n = 29), very good (n = 25), good (n = 7), sufficient (n = 1), bad (n = 2), or was not reported/not applicable (n = 15). Conclusions AI is experiencing an increasingly significant role in rhinology research. Articles are showing high rates of diagnostic accuracy and are being published at an almost exponential rate around the world. Utilizing AI in radiological diagnosis was the most published topic of research, however, AI in rhinology is still in its infancy and there are several topics yet to be thoroughly explored.


Introduction
Artificial intelligence (AI) is an increasingly exciting area of research in medicine. Amongst other benefits, AI has the potential to automatically perform complex tasks with great speed 1 and precision. Various applications of AI in medicine have already evolved from theoretical or proof-of-concept to being used in clinical practice, such as the automatic detection of atrial fibrillation via a smartphone or smartwatchbased ECG monitors 2,3 or continuous glucose monitoring to prevent hypoglycaemia. 3,4 In this field there are several definitions that should be considered ( Figure 1). Machine learning (ML) is a subset of AI that uses prior data to make improved decisions about future data. 5 ML algorithms can be split into 3 main categories: supervised learning, unsupervised learning, and reinforcement learning. 6 Supervised learning requires labelled datapoints for a ML algorithm to learn from, to later make predictions on unlabeled data. 5 Unsupervised learning algorithms find patterns (eg, in cluster analysis) in unlabeled datasets. 5 Reinforcement learning is the training of a ML model to make a sequence of decisions to solve a task through a process of trial and error. 7 An artificial neural network (ANN) utilizes components of supervised and reinforcement learning to solve problems. ANNs use layers of processing to make sense of input information. The output of a layer becomes the input for the next layer, until it has been transformed into an output that can be used by the network. 8 Deep learning is a subset of ANNs where at least 3 layers are used in the network. 9 A convolutional neural network (CNN) is a type of ANN that processes data with grid-like structures (such as images). 10 Natural language processing (NLP) is an entirely different subset of AI that allows computers to understand humans by converting language into data, that are processable by a computer. 11 A considerable volume of research about possible applications of AI in the field of rhinology is available, yet no application has a widely used clinical application to date. This scoping review aimed to provide a brief overview of all current literature on AI in the field of rhinology. Further, it aimed to highlight implications for clinical practice and future research for rhinology researchers.

Materials and Methods
Following a preliminary search of the literature on the usages of AI in rhinology, a scoping review was conducted. The Preferred Reporting Items for Systematic Reviews and Meta-analyses Extension for Scoping Reviews 12 (PRISMA-ScR) checklist was used to guide the review. Registration was not applied as PROSPERO does not accept scoping reviews, literature reviews, or mapping reviews.

Eligibility Criteria
Eligible studies described an application of AI to solve a clinical question in the field of rhinology. Articles needed to be published since 2017 to capture the most up-to-date literature in this rapidly expanding field of research. 13 Unpublished literature is often reported in scoping reviews, 12 as one of its purposes is to map a body of knowledge to identify gaps in research. 14 Hence conference and poster abstracts were eligible for inclusion provided they had been presented since 2020. Grey literature from prior to 2020 were excluded. Articles were excluded if AI was only used in the statistical analysis of a paper, or if the use of AI was only to calculate radiation dosages in oncological settings. Articles that described the automatic segmentation of nasopharyngeal cancers in radiological scans were excluded if they were utilized for the purpose of determining radiation therapy/treatment. Works without original data, animal studies, and studies unavailable in English were also excluded.

Information Sources
A systematic electronic search was performed for relevant studies using the Ovid MEDLINE (1946-2021) and EMBASE (1974-2021) databases from January 1, 2017 until the May 14, 2022 using a defined search strategy (Table 1). AI terms including "DEEP LEARNING" and "articifial adj2 intelligen*," rhinology terms including "NOSE" and "Rhin*," and investigation terms including "Endosco*" and "MRI*" were combined using the Boolean operators (AND and OR) to broaden and limit the search where appropriate. A manual bibliographic screen from the included and other relevant studies was performed to search for additional relevant articles undetected by the original search. An attempt was made to contact authors of studies where the full-text was not found.

Study Selection
Duplicate studies were automatically removed using the OVID duplicate removal function, and later, manually. The remaining studies were exported to Rayyan (Qatar Computing Research Institute, Qatar), 15 an online review tool, for screening against the eligibility criteria outlined above. Study selection was performed by 2 authors (GO and RK); uncertainties were resolved by consensus. Studies were screened in 3 phases: first by title, then by abstract, and finally by full-text. Articles that met the eligibility criteria were included for data collection.

Data Collection
Data extracted from individual studies were recorded in Microsoft Excel for Mac (Version 16.63.1). 16 Data fields collected included names of first and second authors, year of publication, study design, number of participants in total, as well as training, validation, and testing sets, demographic data of participants, country of publication, journal of publication, field of research, anatomical structures defined (if any), pathologies identified (if any), type of AI used, specific AI programs, algorithms, and software used, and diagnostic or prognostic accuracy (if applicable). There were several possible metrics of diagnostic accuracy. In order of preference, receiver operator characteristics area under the curve (ROC-AUC), Dice coefficient, and accuracy were used, however other metrics were satisfactory. Diagnostic or prognostic accuracy was rated as excellent if the ROC-AUC (or next best metric) was more than 0.9, very good if 0.8-0.9, good if 0.7-0.8, sufficient if 0.6-0.7, or bad if 0.5-0.6.

Critical Appraisal of Individual Sources of Evidence
A critical appraisal or risk of bias assessment of individual studies is not typically performed in scoping reviews as their purpose is to identify the available evidence on a topic regardless of the methodological quality. 12

Study Selection
The search strategy yielded a total of n = 2420 studies. After the removal of 680 duplicates, n = 1740 titles were screened. Abstract screening was performed on 205 articles. Full-text analysis was performed on 129 articles (2 full-texts were unable to be located), resulting in 62 articles being included from the search. An additional 35 records were identified and screened from manual citation searching, of which 17 were included in the final analysis (7 of which were from journals not indexed by MEDLINE or EMBASE), leading to a combined total of 79 articles ( Figure 2).
Applications of AI in rhinology were arranged into 5 major categories ( Figure 4):

Discussion
The majority of publications on the use of AI in rhinology were in diagnostics (n = 49); most of these articles (n = 42) discussed radiological diagnostics in the form of CT, MRI, or X-ray imaging to automate identifying anatomical and pathological findings with CNNs, while the remaining (n = 7) detailed other diagnostic tools such as nasal endoscopy, OCT, or histopathology. Phenotyping/Endotyping was the next most common category of AI usage in rhinology (n = 12). The majority of these articles utilized cluster analysis to generate novel sub-type groups in pathologies such as CRS, acute sinusitis, and AR. In the prognostication group (n = 10), ML techniques (unsupervised or semi-supervised) and cluster analysis were used to predict outcomes or responses to treatments most commonly in CRS and AR. Surgical planning and assessment in rhinology was the least common area of AI research (n = 8), with newly evolving CNNs to aid pre-surgical assessment of nasal morphology and endoscopic navigational positions, intra-operative assessment with live annotation, analysis of surgical steps and video clipping, and post-operative tools such as surgical report generation and determination of prior rhinoplasty. The higher volume of AI rhinology research in radiological diagnostics compared to topics such as surgical planning and assessment is perhaps explained by a larger foundation of AI research in radiology in medicine as a whole, which is probably magnified by the investments in the field by tech giants such as Google. 96,97 This would allow researchers to broadly apply previously published AI techniques to their own rhinological niche, whereas developing an AI for surgical planning and assessment requires a more novel technological approach. Throughout the articles, various AI programs were used; notably U-Net, a CNN designed for biomedical image segmentation and Res-Net, an ANN deigned for image recognition were most common.
Articles frequently reported high diagnostic accuracy of their AI algorithms, with 36.7% having an ROC-AUC (or other similar metric) rated as excellent, 31.6% rated as very good, 8.9% as good, 1.3% as sufficient, 2.5% as bad, and 19.0% not reporting.

Implications for Clinical Practice
There are several promising areas in the field where AI has the potential to augment a rhinologist's practice. Although the risk of being entirely replaced by AI and robots is slim, 98 there are potential ethical and legal issues 99 with the implementation of AI in medicine which need to be addressed before AI can be considered for adoption into mainstream clinical practice.
In supervised learning models, data need to be labelled to train the ML algorithms. If the data is labelled incorrectly, the algorithm learns incorrectly, amplifying bias. This idea of 'garbage in, garbage out' 100 has been prevalent in computing since prior to the development of AI, 101 and in the field of AI is commonly referred to as algorithm bias. 102 A metaanalysis comparing the decisions of a ML software, called Watson For Oncology, to a multidisciplinary team of experts, found discordance in the decision making for treatment choices in lung cancers. 103 This was partially explained by the use of synthetic data to train the algorithm 104 as well as patient demographic differences. 103 Very large amount of precisely labelled real data is essential for the development of supervised ML algorithms. 5,105 It is also important that the data used to train ML algorithms is reflective of the diversity of the population. A landmark study in 2018 demonstrated that facial recognition AI developed by companies including IBM and Microsoft performed significantly worse at recognizing darker-skinned females than lighterskinned males, with error rates up to 34.7% compared to 0.8%, respectively. 106 The algorithmic bias was attributed to the databases used to train the AI being disproportionately comprised of lighter-skinned subjects. 106,107 Reinforcement learning models such as ANNs, and unsupervised leaning models such as cluster analyses, often use 'black box' algorithms, which are AI systems in which the methodology for obtaining an output is hidden to the operator. 108 An argument can be made in support of block box algorithms, as theoretically it shouldn't matter how an algorithm gets to an answer if that the answer is always correct. 109 The counterargument is that these algorithms don't always produce a correct answer and that transparency is essential in bias detection. 110 Furthermore, clinicians are also more likely to utilize AI if they are able to make sense of how the AI makes its decisions. 111 Another challenge for implementing AI in medicine is navigating liability regimens. At present, clinicians would be liable if they acted against best practice due to the advice of an AI algorithm. 99,112 In the future, if (or when) AI becomes more routine, we may have the corresponding case where clinicians could be liable if they act against the advice of the AI.

Implications for Future Research
The obvious gaps in the research identified from this review are the comparative lack of publications about AI in phenotyping or endotyping pathologies, prognostication, nonradiological diagnostics, and surgical planning and assessment in comparison to radiological diagnostics.
However, the most important gap is the shortage of large datasets which are needed to develop rigorous algorithms. As previously discussed, large datasets of precisely labelled data are needed for supervised learning models to prevent algorithm bias. Before widespread adoption of AI into rhinology clinical practice, there needs to be a collaborative effort to collect appropriate datasets that reflect the demographics of the population. National registries have been used in orthopaedics 113 and cardiology 114 for this purpose. However, with the increased volume of patients in national registries may come a decrease in the quality of the data. 115 Conclusions AI research is increasingly common, with publications per year from around the globe increasing at an almost exponential rate. Utilizing AI in radiological diagnosis was the most published topic of research, however, AI in rhinology is still in its infancy and there are several topics in prognostication and surgical planning yet to be thoroughly explored.

Declaration of Conflicting Interests
The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Richard J Harvey is consultant/advisory board with Medtronic, Novartis, GSK and Meda pharmaceuticals. Research grant funding received from Glaxo-Smith-Kline. He has been on the speakers' bureau for Glaxo-Smith-Kline, Astra-zeneca, Meda Pharmaceuticals and Seqirus. Janet Rimmer has honoraria with Sanofi Aventis, Novartis, Mundipharma, BioCSL, Stallergenes. Larry Kalish is on the speakers' bureau for Care Pharmaceuticals, Mylan and Seqirus Pharmaceuticals. Raymond Sacks is a consultant for Medtronic and is in the speaker bureau for Meda Pharmaceuticals. All other authors have no financial disclosures or conflicts of interest.

Funding
The authors received no financial support for the research, authorship, and/or publication of this article.

Supplemental Material
Supplemental material for this article is available online.