Is AI Augmenting or Substituting Humans? An Eye-Tracking Study of Visual Attention Toward Health Application

In this paper, the authors focus on artificial intelligence as a tangible technology that is designed to sense, comprehend, act, and learn. There are two manifestations of AI in the medical service: an algorithm that analyzes and interprets the test result and a virtual assistant that communicates the result to the patient. The aim of this paper is to consider how AI can substitute a doctor in measuring human health and how the interaction with virtual assistant impacts one’s visual attention processes. Theoretically, the article refers to the following research strands: human-computer interaction, technology in services, implementation of AI in the medical sector, and behavioral economy. By conducting an eye-tracking experimental study, it is demonstrated that the perception of medical diagnosis does not differ across experimental groups (human vs. AI).


INTRodUCTIoN
Artificial Intelligence (AI) is being applied to more and more areas of everyday life, including medical diagnostics (McDougall, 2019).In the past two decades, many types of IT-based systems have been broadly utilized in health care in different domains (Rajabion, Shaltooki, Taghikhah, Ghasemi, & Badfar, 2019).Various scenarios have been made possible by rapid advances in information and communications technologies and by the increasing number of smart things (portable devices and sensors) (Kashyap, 2020).The promise of artificial intelligence (AI) in health care offers substantial opportunities to improve patient and clinical team outcomes, reduce costs, and influence the health of the general population.From the patient's point of view, it enables constant monitoring of the body's condition, and analysis of the results through comparison with a vast database containing information no doctor or human analyst would have time to process (Grzywalski, et al., 2019).From the doctor's point of view, it allows them to diagnose patients without having to have contact with them.And the doctor, thanks to AI support, can focus on patients who need direct personal contact (Davenport

AI in Services
One of the fundamentals of service science says that interaction takes a unique role in services because usually the value is delivered to a customer during those interactions (Maglio & Spohrer, 2008).However, it is not always a human actor who interacts with a customer -technology can also play such a role in service delivery.Specifically, technology can play an augmenting or substituting role for a human actor in service delivery (Marinova, De Ruyter, Huang, Meuter, & Challagalla, 2017).It can enhance the human ability to think (e.g., by collecting and analyzing data) and to act (e.g., by communicating).In some settings, technology can interact with customers faster, cheaper, and in a more convenient way than a human, all in the name of service delivery (De Keyser, Köcher, Alkire, Verbeeck, & Kandampully, 2019).It can also be used by a customer who consumes a service, e.g., using a smart band to collect medical data from a person who wears it and sending it to an algorithm that provides real-time analysis and alerts a doctor if the results exceed allowable levels.
According to Marinova et al. (2017), technology can also replace a human actor who delivers a service.A chatbot or Virtual Assistant (VA) can be one example of such substitution as they are made available, thanks to AI-based technology (Kot & Leszczyński, in press).It can communicate with a patient who uses the medical service to present the results of medical tests, comment on those results, and schedule a meeting with a doctor if needed.Bots and VAs can be shaped to appear as human-like, animal-like, cartoon-like, and functional (Fong, Nourbakhsh, & Dautenhahn, 2003) to facilitate natural and effective interaction.Their interactive behavior can be programmed to include an emotional display, gaze, personality, sense of humor, or even gestures and movements (De Keyser et al., 2019).Their interface can be based on text, voice, graphics, touch, or biometrics.Wijesinghe et al. (2019) in a research paper on an Intelligent Diabetic Assistant present such a system which decides the diagnosis and the treatment priority depending upon the observations that have appeared on the screen.Summing up, both sides of health care service interaction can use the technology as a part of the service.It allows clinical services to be delivered to patients remotely, into areas which were previously served by well-trained human clinicians (Aborujilah et al. 2020).Other examples of such applications in health care are the virtual nurse assistants Sensely (sensely.com)or Angel (careangel.com).
Thus, AI can augment or replace humans in service; it can be utilized by a customer or service company -both scenarios are plausible.It can already be seen in the medical sector, where more and more data on personal health measurements can be collected and analyzed, and doctors are increasingly burdened with more and more cases (Reddy, Fox, & Purohit, 2019).Applying solutions based on AI to support and help is at the very least reasonable in these situations.
Current works on human-AI interaction suggests that in every mode of interaction, an interface plays a vital role, as it facilitates mutual communication (Li, 2015).The communication features of an AI system can influence how humans approach them (Mou & Xu, 2017).A simple AI system does not deliver the capability of self-adaptation that results from an advanced comprehension of human verbal and non-verbal communications and the ability to answer correctly (Fanelli, Gall, Romsdorfer, Weise, & Van Gool, 2010).Advanced, interactive AI systems are able to imitate humans and to mimic human users (Khan & Das, 2018).One of the first studies was conducted in the 1960s at MIT Artificial Intelligence Laboratory (Weizenbaum & McCarthy, 1976).Researchers programmed the AI system to act like a doctor who conducts the therapeutic process to a patient.Despite the simplicity of that system, some patients attributed human-like feelings to it and even wanted to continue the therapy.Today, researchers do not focus so much on tricking AI users (as AI obviously can do it) but on their attitude to AI systems.User satisfaction, sense of personalized service, engagement in conversations, or trust in AI is considered (Przegalinska, Ciechanowski, Stroz, Gloor, & Mazurek, 2019).

AI in the Health Care Sector
In the near future, AI-based applications can potentially be used in a variety of areas related to prevention, treatment, and rehabilitation.AI can support the work of doctors, making it easier to make the right diagnosis or to choose the optimal path of treatment for the patient (Reddy et al., 2019).At the same time, AI indicates an opportunity for the development of personalized medicine.Research on new molecules is moving towards personalized therapies, so the pharmaceutical product can be selected for the patient in terms of their specific genetic profile.So, another area of AI application in the health care sector is in the drug research and development process (Mak & Pichika, 2019).Solutions using intelligent and personal robots are beginning to be successfully introduced to medical devices (Corritore, Wiedenbeck, Kracher, & Marble, 2012).AI is expected to revolutionize the way medicine is practiced, improve process efficiency and thus change the way health care is provided (Kassam & Kassam, 2020).
The next area is the clinical decision-support system (CDSS), defined as "any electronic system designed to aid directly in clinical decision-making, in which characteristics of individual patients are used to generate patient-specific assessments or recommendations that are then presented to clinicians for consideration" (Kawamoto, Houlihan, Balas, & Lobach, 2005).CDSSs apply the bestknown medical knowledge to patient data to generate case-specific decision support, particularly in preventive care services and treatment recommendations (Bright, et al. 2012).The technology allows cheap and effective diagnostics, detecting any deviations from standards, analyzing the results by comparing them with the huge amount of results stored in the database.The development of such systems is essential in countries like Poland, where only one in three visits to a primary care physician is justified by health problems.In many situations, the patient goes to the doctor only to be reassured that he or she is healthy (Grzela & Kuczyńska, 2019).CDSS solutions allows a doctor to quickly and effectively determine who needs medical consultations and additional diagnostic procedures under the supervision of a specialist.This can lead to savings in time and resources for doctors, patients, and medical facilities.
Aside from the many technical limitations of current AI technologies in comparison with human vision, language processing and context-specific reasoning, other distinctive challenges exist (Reddy et al., 2019).Despite the growing use of automated AI-based solutions by patients, social skepticism is still noticeable regarding these applications (Hengstler, Enkel, & Duelli, 2016).On the one hand, according to research conducted in Great Britain, 63 percent of adults are concerned about the use of sensitive data, especially by artificial intelligence, to improve health care (Fenech, Strukelj, & Buson, 2018).However, in the situation of technological development and the disappearance of the human element in health care, one can speak of the emotional cost of this technological progress.A critical foundational relationship in health care interactions is the trust between patients and doctors (LaRosa & Danks, 2018).Patients are accustomed to interacting with people at various points of contact in the health care service and then taking into account their opinions, preferences and situations in decisions, interest in their emotional state, and providing support in this regard for health care services (Siemens Healthineers, 2019).Patients are skeptical about the new situation in which they need to communicate with a VA instead of an empathic and trusted person.According to Syneos Health Communications (2018) research, less than 20 percent of patients see any benefit to receiving a diagnosis or treatment recommendations from a virtual assistant.The black box nature of Al produces a shiver of discomfort for many people.How can we trust our health, let alone our very lives, to decisions whose pathways are unknown and impenetrable?(Feldman, Aldana, & Stein, 2019).

Hypothesis development
Although AI systems are increasingly appearing in the area of health, people seem to be suspicious about such solutions.Their fears arise from their belief that the machine cannot recognize the uniqueness of the symptoms and features of the disease (Longoni, Bonezzi & Morewedge 2019).Patients are convinced that they are unique and different from everyone else, as individuals.Research also shows that relying on the diagnosis of a doctor (instead of a computer) reduces the patient's sense of responsibility for the decision (Promberger & Baron, 2006, study 1).In turn, receiving a recommendation from a computer increases the tendency to search for information (Promberger & Baron, 2006, study 2).A human decision provider is preferred, even when people are aware that computers can outperform humans (Dietvorst, Simmons, & Massey, 2015).It was also demonstrated that patients perceive those physicians who don't use computer-based solutions to provide a decision more positively (Shaffter, Probst, Merkle, Arkes, & Medow, 2012).Patient resistance to AI-based solutions may also be due to a digital environment that limits trust and contributes to people's greater suspicions (Taddeo, 2009;Ferrario, Loi, & Vigano, 2019).
In this paper, it is anticipated that using AI-based systems in health care evokes fears and concerns in patients, leading to lower willingness to rely on a medical diagnosis provided by a Virtual Assistant instead of a human.As a consequence, users exposed to a medical diagnosis provided by a Virtual Assistant would be more motivated to search for information and to confront a real doctor with that recommendation.In this paper we aim to investigate those assumptions regarding patients' fears with the use of non-declarative techniques related to eye movements (Henderson & Hollingworth, 1998).It is assumed that if participants are more suspicious about a diagnosis from VA, as their goal-orientation is focused on seeking more information, they will process information more deliberatively.This, in turn, can lead to different task-related attentional processes as measured by eye fixation patterns (Orquin & Loose, 2013.It is hypothesized that: Hypothesis One: Participants presented with the diagnosis from a virtual assistant spend more time reading the diagnosis.Hypothesis Two: Providing the diagnosis from a virtual assistant enhances focusing participant's attention on the signature of the diagnosis provider.Hypothesis Three: Elements of the mobile app that are related to reducing concerns (contact button) gain more eye fixation in the virtual assistant condition.
In this study, the top down control of attention perspective is adopted, assuming that attention is driven by goal-orientation (Theeuwes, 2010).The second perspective -bottom up control of attention, emerging from the visual and textual properties of the stimuli -is rejected because participants saw the same stimuli and the only differentiating factor was the source of information.To justify our hypothesis, we rely on evidence that the uncertainty and the need to acquire new information can influence eye movement (Gottlieb, Hayhoe, Hikosaka & Rangel, 2014), and that fixation patterns reflect different cognitive goals and vary due to the variety of tasks or problems that need to be solved (Hayhoe, 2004).
In order to test the proposed hypotheses, we designed an eye-tracking experiment.The current study aims to measure attentional processes (reflected by eye movements) toward specific visual elements of the mobile app.To this purpose, we define areas-of-interest (AOIs) regarding the content of diagnosis, the signature of its source and contact-a-doctor button, imitating the possibility to contact with a real doctor.

RESEARCH oVERVIEw
In the current research, we invited people aged 18-29 from Poland to take part in the experiment.This group was chosen for at least two reasons.Patients are generally disappointed with the health care system in Poland, and people aged 18-29 assert the lowest level of satisfaction with the Polish health system.They mistake the waiting time for the service and the time spent consulting with doctors.They are also more sensitive than the other age groups to the intelligibility of the information provided by staff and the possibility of establishing a dialogue (Siemens Healtineers, 2019).
We presented them with a fictional scenario in which we exposed them to information about the mobile medical application and a fictional diagnosis from that application.We decided to use the existing health care application -StethoMe, which was introduced by a Polish start-up (Grzywalski et al., 2019).This application is an AI-based system that a patient can use to continuously monitor the state of the body.Patients are dealing with two manifestations of AI: an algorithm that analyzes and interprets the test results, and then communicates the result to the patient and manages his communication with the doctor.The application screens were modified for the purpose of this study -it did not conduct any medical measurement on the participants (they were only exposed to information about it and the results of an example medical measurement).Additionally, this application was virtually equipped with Evie VA, based on real VA offered by Evie company (Kot & Leszczyński, 2020).
We set two experimental conditions, differing in the source of the diagnosis provider.Half of the participants (graduate and undergraduate students of management) saw the diagnosis from a human (doctor) while the other half saw the one from the VA.Participants were randomly assigned to one of two conditions.The experiment lasted for about 10 minutes.It consisted of a stimuli exposition (the first screen was related to initial information about the mobile app and the second to medical examination and diagnosis) and a battery of questions related to personal characteristics and metrics.In order to measure participant's visual attention and cognitive effort, we used the screen-based eye tracking device, capturing gaze data at 120 Hz.A 9-point calibration was carried out before the experiment started.When the experiment finished, participants received information about the purpose of the study and were debriefed.

Experimental design and Stimuli
In order to test the hypotheses proposed, we used a single-factor experiment confined to two groups.The independent variable was the source of the medical diagnosis (human vs. VA).When we exposed participants to the first screen, which contained an initial piece of information about the medical app, the manipulation included a short description of the main features of the app.The human-condition information contained claims that (1) a real doctor (we used a fictional name: "Ewa Gran") will carry out any assistance that is needed, (2) a real doctor is responsible for sending the diagnosis to the user and (3) a real doctor will provide all information in the app.In the VA-condition, all the features remained identical, except for the claim that a VA (we used a name: "Evie") is responsible for (1) providing any help, (2) sending the diagnosis, and (3) providing all the information.The first screen contained other information related to the necessary information about the application, and that remains consistent through all experimental conditions.
The second screen presented to the participants contained results of the health examination and information that one of these results slightly exceeds the norm.Regarding the results, participants saw diagnosis from a human (vs.VA), suggesting that exceeding the norm is not significant, and contact with a doctor is not necessary.We also informed participants that they might use the application to contact a human (vs.VA) to obtain more information.The only manipulation used in the second screen was related to the signature below the diagnosis and the information about obtaining additional information.Participants from the human-condition saw a signature from "Ewa Gran -the doctor" while those from VA-condition saw one from "Evie -virtual assistant".The research procedure is illustrated in Figure 1.

Eye-Tracking Measurements
The data considered for the analysis was the time spent reading the information and the fixation count.Time spent (dwell time) offers information about the amount of time that a participant has spent looking at a particular area-of-interest (AOI).A longer duration of time spent on exploring AOI may indicate one's motivation or the attention paid to a process stimulus (Farnsworth, 2018).In the current study, it is assumed that time spent on reading information about VA (vs.human) is longer, as participants are being exposed to a novel situation, and the duration of looking reflects their motivation to understand it.This assumption is supported by an extensive body of research regarding ease of comprehension reflected by time spent looking at an object and eye fixations (Just & Carpenter, 1980;Kliegl, Nuthmann, & Engbert, 2006;Rayner & Reingold, 2015).For example,  Becker (2011) demonstrated that both perceptual and conceptual difficulty in processing lengthen dwell time.This finding is supported by experiments conducted by Liu, Cole, Belkin, Gwizdka, and Zhang (2011) suggesting that longer dwell time indicates processing difficulty.
The fixation count is the second metric used in the current study to capture goal-directed eye movement (Krauzlis, Goffart, & Hafed, 2017).As emphasized in hypothesis 3, it is expected that VA participants are more willing to contact a real doctor, so they focus more on that button.The fixation count informs researchers about attention and interest in the button, and, as this metric is goal-directed, it is assumed that it reflects the behavioral intention to contact a real doctor (Susac, Bubic, Kaponja, Planinic, & Palmovic, 2014).For example, number of fixations is a well-used metric of the ease or difficulty during the performance of problem-solving tasks (e.g.Gegenfurtner, Lehtinen, & Säljö, 2011), allowing the prediction of behavior and preference on the basis on fixations (Orquin & Loose, 2013).

RESULTS
An Independent Two-Sample T-Test was conducted (due to the relatively small group sample, N = 38, 71% female; M age = 22, SD = 1.62) to determine the statistical significance between the variables.The first step of analysis was regarding the perception of the introductory information about the app and its features.We took into consideration 3 AOIs related to essential claims (presented on the first screen) that differ across the experimental conditions (VA vs. human).Statistical analysis revealed no statistically significant differences between the VA and human groups regarding time spent on reading information and fixation count (Figure 2).

Source: Author
The second step of the analysis was crucial to the test of the hypotheses proposed.First, the assumption about the time spent on reading the diagnosis (H1) between two experimental conditions (VA vs. human) was tested.Contrary to the predictions, there is no statistically significant difference in time spent (in milliseconds) on reading due to the provider of the diagnosis, M VA = 5400.47,SD = 3490.10vs. M human = 6061.32,SD = 3205.21,t(36) = -.608,p = .401.Second, the assumption about the attention paid to the signature of the diagnosis provider (H2) was tested.Again, contrary to the predictions, there is no statistically significant difference between participants exposed to diagnosis from VA vs. a human diagnosis as measured by fixation count, M VA = 3.21, SD = 2.44 vs. M human = 3.27, SD = 2.22, t(36) = -.221,p = .836.

Figure 2. Mean time spent on reading app features
The last analysis was conducted to test H3.It is expected that the contact-a-doctor button attracts more fixation among participants who are presented with the diagnosis provided by VA (vs.human).The results obtained confirmed this assumption.Fixation count is significantly greater in the VA condition, M VA = 4.37, SD = 3.89 vs. M human = 1.79,SD = 2.64; t(36) = 2.392, p = .022.Results are presented in Figure 4. ** -statistically significant differences for p < .05Source: Author

dISCUSSIoN
The results obtained demonstrate that the processing of introductory information about the medical app was the same across the two experimental groups.The analysis revealed that participants did not process information from the VA more eagerly than information from the real doctor (who provides the diagnosis to the user).It seems that at some initial stage of interaction with the mobile app, claims regarding the use of AI systems do not evoke any concerns from the participants of our study.This is contrary to other studies that suggests limited trust, high levels of initial resistance, and the perceived risk from radical innovations in the development process or at an early phase of diffusion (Heidenreich & Spieth, 2013).Moreover, with regard to the concept of e-trust (Taddeo, 2009), a digital environment does not meet the minimal requirements for trust to emerge.Thus, people exposed to AI-based systems might be more suspicious about relying on its recommendations (Ferrario et al., 2019).The dissimilarity of results in comparison to our research may be explained by the research sample, which consisted of relatively young participants who are more trusting in autonomous systems (Zhang, Na, Robert, & Yang, 2018).On the other hand, AI-based systems can be still novel for patients in the health care domain, so participants could interpret a virtual assistant as something atypical, thus causing concerns to arise.This assumption, however, has not been evidenced by the eye movement patterns.
The second step of data analysis covered visual processing of the screen related to examination and diagnosis.The 3 AOIs were defined according to the content of diagnosis, the signature of the diagnosis provider, and the contact-a-doctor button.Contrary to our assumption (hypothesis 1), participants presented with the diagnosis from a virtual assistant did not spend more time reading the diagnosis.Also, hypothesis 2 was rejected, as providing participants with the diagnosis from VA did not enhance focusing attention on the signature of the diagnosis provider.Crucially, a statistically significant difference in attention paid to the contact-a-doctor button was found during the fixation count.As expected (hypothesis 3), VA participants focused more on the button compared to those provided with the diagnosis from a real doctor.This is consistent with Promberger & Baron (2006) claiming recommendation from a computer enhance additional information search.In our scenario, participants could expect to receive more details from the doctor.
Despite the lack of differences in the processing of essential features of the health app and the diagnosis itself, VA participants focused more on the AOI related to communicating with a doctor.This result is surprising, because if interaction with VA evokes concerns about diagnosis reliability or accuracy, it should have led to more significant cognitive effort from the test subjects during processing.A potential explanation may relate to an implicit preference for personal contact with a doctor when a patient suspects his or her health might be at risk.Our participants were informed that their results slightly exceeded the norm.Thus, they might not be motivated to actively engage in the processing of the diagnosis.The discrepancy between the processing of diagnosis and focus on the contact-a-doctor button could result from a variation in perception of the risk to their health, confirming that attitude to health is driven by the one's locus of control (Steptoe & Wardle, 2001).Specifically, participants could deprecate the value of diagnosis information from both humans and VA, and not actively engage in processing it, as they felt that their health was not at risk.The difference in attention paid to the contact button could therefore result from a general preference for personal contact with a doctor evoked by the message from VA.This result might be explained by Longoni, Bonezzi and Morewedge (2019) findings that each patient perceives his or her diseases as unique, so they do not want to be treated by AI-based algorithm as just one of many similar cases.Therefore, they seek diagnosis from a doctor, who will treat them as individuals.This is also coherent with the research showing that the main fears patients have about AI are the lack of human oversight (Syneos Health Communication, 2018).

CoNCLUSIoNS
In the current study, the visual processing of the medical app providing participants with a fictional diagnosis of their heart activity was investigated.The results discussed above lead to three conclusions.First, performing a medical analysis by systems based on artificial intelligence is acceptable for these participants, unless the result of the examination is delivered by a real doctor (human).This result should be considered from the perspective of the skepticism associated with the high emotional cost of implementing automated solutions for patients.Patients place trust in their doctors and different medical professionals who must interact to support the care of a patient (LaRosa & Danks, 2018).The introduction of health care AI-based applications can potentially have significant impacts on those relationship, which are based on trust.Thus, we suppose that AI systems are unlikely to replace human clinicians on a large scale, but rather will augment clinician's efforts to care for patients, so they should be treated as assistive technologies.Over time, human clinicians may move toward tasks and job designs that draw on their uniquely human skills, such as empathy, persuasion and big-picture integration (Davenport & Kalakota, 2019).
Second, the results lead to questions about human-machine interface design and communication methods, which can affect the level of acceptance for autonomous technologies (Hengstler, Enkel, & Duelli, 2016) and reduce the emotional necessity of having to contact a human doctor.As the diagnosis in our study was presented without any room for human involvement, the acceptance of such technologies by patients goes beyond perceived usefulness, perceived ease of use and attitudes towards use (Song, 2019).The method and form of communication refers, to the issue of anthropomorphism, among other things, as described according to De Visser et al. (2016) adding human features (appearance or social ability) increases trust resilience.Trust to VA in health care can significantly affect the effectiveness of this type of solution.
Third, our observations may give an exciting premise for further research if extended to investigate behavioral intention and correlation with eye-tracking data.The behavioral intention would be reflected by eye-tracking data.Another issue is the aesthetic appraisal of the virtual assistant, which could differ due to the level of anthropomorphism (MacDorman & Chattopadhyay, 2016).This kind of research could contribute to a better understanding of the adaptation of AI in patient use of medical applications.
This research has some limitations that are typical for experimental research.Experimental designs create situations that are inherently unrealistic, and thus not crucial for participants, so the relative robust health of the participants in the study could affect their propensity to engage in the experimental scenario, and moderate their responses to the manipulations used.Therefore, this research is the initial study and requires replication.However, it is suggested that such applications may be a valuable in the case of the health care systems that have similar conditions to those of the Polish one.Because of the current situation of the lack of physicians, where many patients go to the doctor only to get assurance that he or she is healthy, diagnostic applications based on AI can contribute to reducing this particular problem.

Figure
Figure 1.Research procedure

Figure 3 .
Figure 3. Mean time spent on reading diagnosis