A practical test for retronasal odor identification based on aromatized tablets

Background: Olfactory perceptions elicited by odors originating from within the body (retronasal olfaction) play a crucial role in well-being and are often disrupted in various medical conditions. However, the assessment of retronasal olfaction in research and the clinical practice is impeded by the lack of commercially available tests and limited standardization of existing testing materials. New Method: The novel ThreeT retronasal odor identification test employs 20 flavored tablets that deliver a standardized amount of odorous stimuli. The items represent common food-and non-food-related odors. Results: The ThreeT test effectively distinguishes patients with olfactory dysfunction from healthy controls, achieving a specificity of 86% and sensitivity of 73%. Its scores remain stable for up to 3 months ( r = .79). Comparison with existing method: ThreeT test exhibits a strong correlation with “ Tasteless powders ” measure of retronasal olfaction ( r = .78) and classifies people into healthy and patient groups with similar accuracy. Test-retest stability of ThreeT is slightly higher than the stability of “ Tasteless powders ” (r = .79 vs r = .74). Conclusions: ThreeT is suitable for integration into scientific research and clinical practice to monitor retronasal odor identification abilities.


Introduction
Human sense of smell is stimulated by odorous molecules reaching the olfactory epithelium via two distinct routesthrough the nose (orthonasal olfaction) or through the mouth, oropharynx and nasopharynx (retronasal olfaction).These two types of olfaction differ not only in the delivery route, but also evoke distinct patterns of brain activity (Hummel, 2008;Small et al., 2005), rely on different perceptual and cognitive processes (Hannum et al., 2018;Pellegrino et al., 2021), and present different trajectories of age-related decline (Li et al., 2022).These differences suggest that the orthonasal and retronasal subtypes of olfactory perception should be treated as distinct domains and investigated independently.
Direct comparisons of retronasal and orthonasal olfactory function show that retronasal olfactory perception might be more preserved and less damaged when compared with orthonasal olfaction in patients with smell loss due to chronic rhinosinusitis (James et al., 2022), Parkinson's Disease (Aubry-Lafontaine et al., 2020) or other causes of olfactory dysfunction (Landis et al., 2005).However, retronasal olfaction impairment affects patients' quality of life to a greater extent than deficits in orthonasal olfaction (Oleszkiewicz, Park, et al., 2019).This may be a result of altered perception and decreased enjoyment of food (Croy, Nordin, et al., 2014).Additionally, lower retronasal odor identification ability has been linked to increased symptoms of psychiatric disorders like anxiety, depression, schizotypy, and autism spectrum disorder (Pal et al., 2021).
All these findings highlight the meaningful contribution of retronasal olfaction to various medical conditions and quality of life, and call for further investigation.However, such investigation requires validated olfactory tests and cannot rely on self-ratings as many people are unaware of their retronasal dysfunction (De Rosa et al., 2019;Liu et al., 2020;Negoias et al., 2020) and often confuse it with taste loss (Deems et al., 1991).While multiple tests for orthonasal olfaction testing are widely available, the selection of retronasal olfaction tests is much more limited (Doty, 2018;Fahmy and Whitcroft, 2022;Whitcroft et al., 2023).
The available tests of retronasal olfaction differ in form and type of stimuli used.They may use grocery-available aromatized products that are pulverized, e.g., coffee powder or curry (Heilmann et al., 2002;Niklassen et al., 2022); flavored sorbitol candies (Renner et al., 2009), or aromatized powders without taste component (Yoshino et al., 2020).Most of these tests assess odor identification ability ( Özay et al., 2019), but a retronasal olfactory threshold test is also available (Yoshino et al., 2021).Each type of retronasal test comes with limitations.On the one hand, pulverized commercial products and sorbitol candies have taste components that hinder the ability to exclusively assess olfaction.On the other hand, tests in tasteless powdered forms usually require a trained person to administer, and the amount of powder presented to the patient is difficult to standardize.Finally, some of these tests are not commercially available what makes them unsuitable for regular clinical practice or research.
Here, we present a novel retronasal odor identification test that is called "taste tablet test" or "ThreeT".This test involves flavored tablets that deliver standardized amount of the aromatized stimuli and have only a minimal, barely perceptible sweetish taste component due to their composition of dextrose, sorbitol and acesulfame K.We have verified the test validity by comparing it with "Tasteless powders" retronasal odor identification test (Yoshino et al., 2020), and by comparing the scores between patients with self-reported smell loss and healthy controls.Additionally, as previous research found the relationship between retronasal odor identification and well-being (Oleszkiewicz, Park, et al., 2019), we have included measures of well-being in our study to replicate the findings.Finally, we have examined test reliability using test-retest method.As the test is easy to administer, commercially available, and delivers standardized amount of odorous stimulus, it expands the repertoire of retronasal olfaction testing methods.

Ethics Statement
The study has been conducted according to the Declaration of Helsinki on Biomedical Studies Involving Human Subjects guidelines.All participants provided an informed, written consent.The study was approved by the Institutional Review Board at Technische Universität Dresden (decision: BO-EK-556122022),

Participants
One-hundred sixty two participants aged between 19 and 79 years completed the study.Of these, 79 were patients (M age =52.8 years, SD=13.7 years; 43 women) of a tertiary Smell & Taste Clinic who reported subjective problems with their sense of smell upon admission.We also recruited 83 healthy participants (M age =40 years, SD=14.9 years; 57 women), and none of them declared any chemosensory problems.The gender ratio between men and women was similar across patients and healthy participants (χ 2 1 =2.90, p=.089).Patients were on average older than their healthy counterparts, F 1, 160 =31.81, p<.001.
Seventy-nine healthy participants (M age =39.1 years, SD=13.7 years; 54 women) came back for the follow-up appointment that was scheduled within three months after the baseline measurement to verify the test-retest stability of the retronasal tests.

Subjective assessment
All participants rated their smell function and nasal passage patency using an ordinal scale ranging from 0 ("no smell" or "clogged", respectively) to 5 ("very good").Additionally, they rated their smell and taste functioning from 0% to 100%.

Psychophysical olfactory assessment
2.3.2.1.Orthonasal olfaction.We used Threshold and Identification subtests from the Sniffin' Sticks battery (Hummel et al., 1997).The Threshold test measures the sensitivity of the sense of smell with a set of 16 dilutions of phenyl ethyl alcohol.The highest concentration is 4% dilution in odorless propylene glycol that is further diluted in 1:2 ratio until 16 concentrations are obtained.During each trial the participant is presented with three sticksone containing the target odor and two containing an odorless solventand is asked to select the stick containing the target odor.Trials start from the lowest concentration to the higher concentrations until two correct responses are given for the same concentration.Then, the direction of presentation reverses and lower concentrations are presented until the first incorrect response is given.The procedure continues until 7 reversal points are reached and the average of the last 4 reversals is the final score.It ranges from 1 to points with higher scores indicating greater olfactory sensitivity.
In the Identification test the participant is presented with 16 odors and each time is asked to recognize the odor from a set of 4 descriptors presented in picture and labeled (1 correct, 3 incorrect).Each correct identification is scored with 1 point and the final score ranges between 0 and 16 points.

Retronasal olfaction.
The "loto des saveurs" (Sentosphère, Paris, France) employs 32 flavored tablets.Based on the pilot study (see Section 2.4.1 for details) we selected 20 tablets for the main validation study.During each trial participant is asked to put a tablet on their tongue, chew on it for 15-20 seconds to release the odor, and exhale air through the nose.Next, the participant identifies the odor from a list of verbal descriptors (1 correct, 3 incorrect).After each trial participant rinses their mouth with water.For each correct identification participant receive 1 point and the final score ranges from 0 to 20 points.The list of all items used in the study is presented in Table 1.
As the reference test we used Tasteless powders retronasal odor identification test (Yoshino et al., 2020).This test involves 20 different flavored powders evoking no taste perception (Givaudan Schweiz AG, Dubendorf, Switzerland).During each trial, the experimenter puts approximately 0.05 g of the powder on participant's tongue with a wooden spout.Next participant moves the powder inside their mouth and exhale air through the nostrils.The task is to identify the presented odor from a list of 4 descriptors (1 correct, 3 incorrect).Each correct identification is awarded with 1 point and the final score ranges from 0 to 20 points.

Psychological assessment
Participant's well-being was assessed using the WHO-5 questionnaire (Topp et al., 2015).The questionnaire consists of 5 affirmative statement related to one's life satisfaction and well-being.The responses are given on a scale from 0 to 5 and the final score ranges from 0 to 25 with higher scores indicating greater well-being.
We measured participant's positive and negative affectivity using the PANAS scale (Watson et al., 1988).In this scale participants rate the frequency of different positive and negative affective states that they have experienced during the last 12 months.Scores range from 10 to 25 and higher scores indicate greater frequency of positive and negative affective states.

Pilot study
The first stage of the pilot study included pilot testing and a survey about familiarity of the available 32 odors.Pilot testing was divided into two phases, where in phase one 21 participants recognized the odors without cues.In phase two, distractor items were chosen from the available odors itself, the U-Sniff test (Schriever et al., 2018) and from a previous study about odor familiarity in adolescents (Fjaeldstad et al., 2017).The second phase of pilot testing was performed on 25 participants, who were now identifying the odor from four options.Simultaneously we ran a survey about odor familiarity where 128 participants rated each item as very familiar, familiar, somewhat familiar or unknown.Based on the results of pilot testing and the survey we reduced the set to 22 tablets.
In the second stage, on the dataset collected for the purpose of the main study, we have used Item-Response Theory (IRT) analysis.We verified each of the 22 items' difficulty and the discrimination parameter to remove the items without additional diagnostic value.As the responses were coded as correct or incorrect, we employed a 2-parameter logistic model.The following assumptions for performing an IRT analysis were verified: (1) the assumption of monotonicity was assessed based on item characteristic curves and all curves exhibited similar shapes, demonstrating that the assumption was met; (2) the assumption of unidimensionality was met as all items were measuring the same ability, i.e., retronasal odor identification; (3) local independence of the items was achieved as a correct response to one item did not influence the chance of correctly identifying the subsequent items; (4) the assumption of item intervariance was met as the items should not be more easily identified by any of the groups due to factors other than the olfactory ability.Overall, two items (Pineapple, Lime) showed the lowest item-information level and therefore were discarded from the test.Details of this analysis are presented in the Supplementary File 1.

Main study
For the main study all participants provided their demographic data and patients were additionally interviewed about their olfactory dysfunction.Further, orthonasal and retronasal olfactory tests were administered.For each participant, the order of orthonasal and retronasal testing was randomized.Additionally, the two retronasal tests were also administered in randomized order to minimize the potential cofounding effect of the order of testing.After the psychophysical testing, participants filled in the WHO-5 and PANAS questionnaires.
Healthy participants were additionally invited for a retest session within 3 months after the initial test.During the retest session only the two retronasal tests have been administered in a randomized order.

Statistical approach
All the analyses have been performed with R statistical software (R Core Team, 2017) with the significance level set to p<.05.We used packages dplyr, psych, pROC, and effectsize for data analysis (Ben--Shachar et al., 2020;Revelle, 2023;Robin et al., 2011;Wickham et al., 2023), and packages ggplot2 and BlandAltmanLeh (Lehnert, 2015;Wickham, 2016) for data visualization.In the first step, we analyzed data distribution to assess its normality.As the absolute value of skewness of all variables was <1.25 we assumed the data does not strongly deviate from the normal distribution (Kim, 2013) and therefore performed parametric tests.Only for the variables measured with an ordinal scale (the ratings of smell function and nasal passage patency) we used non-parametric tests.For the group comparisons, we used analysis of variance (ANOVA) and non-parametric Kruskal-Wallis test.Paired t-test was used to compare scores in the Tasteless powders and ThreeT tests.As the groups differed in age, we included it in the ANOVA models as a covariate.We used Pearson correlation analysis to verify the linear associations between the variables.To verify whether retronasal odor identification scores might be used for classifying participant as patient or healthy, we used binomial logistic regression models.We calculated tests sensitivity and specificity setting the threshold for predicted probabilities to .50.We presented if graphically as Receiver Characteristics Operator (ROC) curves with comparison of Area Under Curve (AUC) for both tests.
Descriptive statistics for the orthonasal and retronasal olfactory test scores are presented in Table 2.
Patients scored lower in both retronasal tests (ThreeT: F 1, 158 =101.25,p<.001, η 2 p =.39; Tasteless powders: F 1, 159 =113.01,p<.001, η 2 p =.42).The obtained effect sizes for both models suggest that the magnitude of the group differences is similar for both tests.For none of the tests age was a significant covariate (both p>.05).These results are presented in Fig. 2.
Overall, scores in ThreeT test were lower than scores in the Tasteless powders, t(160)=8.31,p<.001.Descriptive statistics demonstrated that patients showed greater variability of the scores in the Tasteless powders test than in the ThreeT test (SD=4.88,SD=4.06) whereas healthy controls exhibited the opposite direction (Tasteless powders SD=1.98,ThreeT SD=3.03).

Comparison of the retronasal tests validity and reliability
The retronasal ThreeT test shows good theoretical validity as the scores from both retronasal tests were positively, strongly correlated (r=.78, p<.001) and this correlation had similar strength for both patients and healthy controls (see Fig. 3).Additionally, Bland-Altman plot suggests that these two tests present good agreement in measuring retronasal odor identification as most of the participants fall between the horizontal lines of agreement (see Fig. 4).
Patients' well-being, positive and negative affectivity were not significantly correlated with their retronasal abilities measured with the ThreeT(r ranging from − .08 to.17, p>.146) and Tasteless powders tests (r ranging from − .06to.17, p>.128).
Both retronasal tests showed good test-retest reliability in healthy participants, meaning that the scores obtained in the retronasal tests are consistent over time.The correlation between scores obtained at the first and second appointment was r=.79, p<.001, and r=.74, p<.001, for the ThreeT and Tasteless powders tests, respectively (see Fig. 5).

Sensitivity and specificity of the retronasal tests
The binomial logistic regression models showed that both retronasal ThreeT test SE=.06,p<.001)and Tasteless powders test scores (b=-.47,SE=.08, z=-6.238,p<.001) were significant predictors in classifying participants to patients or healthy groups.The ThreeT test showed sensitivity of 71.8% and specificity of 85.5%, whereas the Tasteless powders test showed sensitivity of 75.9% and specificity of 89.2%.The ROC curves showed that both tests have similar classification abilities as indicated by AUC values equal to 86 (see Fig. 6).

Discussion
In the present study, we demonstrate that flavored tablets can be used as a valid and reliable test of retronasal odor identification ability.The validity of the ThreeT relies on successful distinction between patients with subjective smell dysfunction and healthy controls as well as Note.SDstandard deviation.the high correlation with the validated and reliable Tasteless powders retronasal odor identification test (Yoshino et al., 2020).The reliability of the test was verified by the correlation between scores of the same participants obtained twice over the course of 3 months.The reliability of the test was high, and comparable with Tasteless powders' stability.
Both retronasal tests used in the study are almost equally effective in distinguishing between patients and healthy controls.Therefore, both of them might be used as alternatives depending on test availability and preferences.However, the ThreeT test seems to be more difficult as all participants obtained lower scores in this test compared to the Tasteless powders test.Due to the increased difficulty, healthy participants show greater variability in the ThreeT test scores, which is reflected in the calculated standard deviations and represented visually in Fig. 2. Thus, use of the ThreeT test might be preferred in research focusing specifically on healthy population where inter-subject variability is desired.In contrast, the standard deviations obtained in the patient group suggest that whenever research focuses primarily on patients with chemosensory dysfunction, the Tasteless powders test might provide greater variability of the scores.
With a sensitivity of 73%, 27% of patients with smell dysfunction will obtain scores in the ThreeT test similar to healthy controls.However, in our study we tested patients with complaints about olfactory dysfunction without specifying if the dysfunction affects orthonasal or retronasal olfaction.This might limit the test sensitivity.Importantly, the test has high specificity of 86% what suggests that there will be less than 15% false-positive diagnoses.
Currently, the ThreeT test does not have gender-and age-stratified normative data.Before more data are collected to provide reference for specific groups, we suggest applying the same criterion for diagnosing olfactory dysfunction as is used in the orthonasal Sniffin' Sticks test, i.e., the 10th percentile obtained in the healthy group (Oleszkiewicz, Schriever, et al., 2019).Based on this, for the presented test a score  One limitation of the ThreeT is the presence of dextrose, acesulfame K and sorbitol that evoke a slight sweetish taste perception.These components are included in the tablets as they were primarily designed as a game for which presence of sweet stimuli was not an issue.Being aware of this limitation, we aimed to verify if such commerciallyavailable and ready to use tablets may serve as a retronasal olfaction test.The validation analyses demonstrated that despite the presence of the sweet compounds, the test allows for successful distinction between patients and healthy participants.Another limitation is that the testretest reliability was verified only for the healthy participants.Therefore, future studies should verify if the test-retest reliability is similar in the patient population.Finally, some of the retronasal tests require participants to block their nostril during retronasal stimulation (Yoshino et al., 2020), whereas in other methods it is not mandatory (Renner et al., 2009).In the presented study, participants did not block their noses during the retronasal testing procedure and there is no data available suggesting how such methodological choice influences the findings.Impact of nostrils blockage for the retronasal testing validity should be systematically examined in future studies.
The next steps of the test development may involve establishing a short, screening version of the test, as it has been done for other retronasal (Besser et al., 2020;Niklassen et al., 2022;Pieniak et al., 2022) and orthonasal (Joseph et al., 2019;Parma et al., 2021;Sorokowska et al., 2019) olfactory tests.Additionally, cross-cultural validation of the test is necessary as previous research demonstrated cultural differences in retronasal perception (Croy, Hoffmann, et al., 2014).
Interestingly, we did not find the previously described correlation between retronasal olfaction and well-being or depression (Oleszkiewicz, Park, et al., 2019).We speculate that this might arise from different retronasal identification tests being used.Oleszkiewicz et al. (Oleszkiewicz, Park, et al., 2019) used grocery products like spice or instant drinks likely to be used by participants before in their daily life   (Heilmann et al., 2002) whereas we used manufactured aromatized tablets that participants had little previous experience with.It is possible that impairment of grocery products flavor recognition strongly affects patient's well-being as it relates to their daily experiences and difficulties faced during food preparation and intake (Croy, Nordin, et al., 2014).On the contrary, the inability to recognize flavors in tablets might be too distant from the challenges faced by patients with olfactory dysfunction to diminish their well-being.This assumption calls for further empirical verification.

Conclusions
The ThreeT is ready to be used in scientific research and in the clinical practice for assessment of retronasal odor identification abilities.The test exhibits similar diagnostic properties compared to a validated and reliable retronasal odor identification test, delivers standardized amounts of odorous stimuli, and shows a high test-retest reliability.

Fig. 3 .
Fig. 3. Correlation between the two retronasal tests presented for patients and healthy controls separately.Bands of the regression lines represent standard errors.

Fig. 4 .
Fig. 4. Bland-Altman plot showing the distribution of score difference between the two retronasal tests against the average score of these two tests.

Fig. 5 .
Fig. 5. Test-retest correlations for the ThreeT test (panel A) and Tasteless powders test (panel B).Bands of the regression lines represent standard errors.

Table 1
Items of the ThreeT tablets test used in the main study.

Table 2
Descriptive statistics for the orthonasal and retronasal olfactory tests scores obtained during the first appointment.