What should we know to develop an information robot?

This paper is aimed at identifying the required knowledge for information robots. We addressed two aspects of this knowledge, ‘what should it know’ and ‘what should it do’. The first part of this study was devoted to the former aspect. We investigated what information staff know and what people expect from information robots. We found that there are a lot of similarities. Based on this, we developed a knowledge structure about an environment to be used to provide information. The developed knowledge structure worked well. In the field study we confirmed that the robot was able to answer most of the requests (96.6%). However, regarding the latter aspect, although we initially replicated what human staff members do, the robot did not serve well. Many users hesitated to speak, and remained quiet. Here, we found that knowledge for facilitating interaction is missing. We further designed the interaction flow to accommodate people who tend to be quiet. Finally, our field study revealed that the improved interaction flow increased the success ratio of information providing from 54.4% to 84.5 %. 10 Abstract This paper is aimed at identifying the required knowledge for information robots. We addressed two 11 aspects of this knowledge, ‘what should it know’ and ‘what should it do’. The first part of this study was devoted to 12 the former aspect. We investigated what information staff know and what people expect from information robots. 13 We found that there are a lot of similarities. Based on this, we developed a knowledge structure about an 14 environment to be used to provide information. The developed knowledge structure worked well. In the field study 15 we confirmed that the robot was able to answer most of the requests (96.6%). However, regarding the latter aspect, 16 although we initially replicated what human staff members do, the robot did not serve well. Many users hesitated to 17 speak, and remained quiet. Here, we found that knowledge for facilitating interaction is missing. We further 18 designed the interaction flow to accommodate people who tend to be quiet. Finally, our field study revealed that the 19 improved interaction flow increased the success ratio of information providing from 54.4% to 84.5 %.

145 observed previously. Then, we asked them to freely provide as many functions they would like 146 information robots to have.

147
The interviews were recorded, and transcribed for analysis. We categorized the different 148 kind of requests expressed by the visitors, For instance, visitors reported sentences such as, 155 Such cases were classified as recommendation (inquiry), because visitors need to know more 156 information than just a location.

157
Then, two coders who do not know the research purpose judged whether each transcribed 158 sentence would fit into the above defined categories, or not (which is categorized as 'other').
159 Two coders' judgment matches reasonably well yielding kappa coefficient .857.
160 161 Table 1 shows the coding result. The ratio of visitors who mention the expectation is listed 162 in each row. They can provide multiple answers, thus the sum of the ratios exceeds 100%.    Reviewing Manuscript 233 restaurants, 42 facilities, 6 event halls, 4 squares (e.g. Figure 3 right), 2 stages, and many offices.
234 The mall is mainly busy during weekends. Almost all shops are for non-daily goods, like clothes, 235 shoes, sports, outdoor activities. We often observe people who look for shops and locations (e.g. 236 they look at the floor maps, and/or ask the service staff There are six types of information communicated in information dialog (section 3.3).
264 Except for name of location, they are realized as 'requestable property' class, which has 265 subclasses 'item name', 'category', 'features', 'people activity', and 'people's state'. When a 266 user requests some information, it is turned into an instance of the 'requestable property'. Then, 267 the location(s) having the same property will be searched. Each property item has wordings that 268 are expected to be used in people's utterance. For instance, 'eat' (instance of people's activity 269 subclass) is associated with wordings such as "eat", "have lunch", and "have a meal". Note that 270 more complex requests (e.g. "Japanese" restaurant with a "good view") can be represented as 271 multiple instances combined with 'and/or' operators, but we did not implement such complex 272 operations because users rarely made such complex requests.

Relationships between 'location entity' and 'requestable property'
274 Table 2 shows possible relationships between two subclasses. For instance, some visitors 275 could request a restaurant where they can have "pasta". To handle such requests, a "pasta" entity 276 is prepared as an instance of 'item name' subclass which is associated with shops with the 277 relation 'is served at". Such relation is defined inside dialog management (section 4.3) as well.
278 Note that an instance of 'requestable property' can be associated with multiple 'location entities' 279 (e.g. "pasta" can be served at multiple restaurants

327
In addition, it reacts to the words for greeting. When an input matches with words like 328 "hello", it returns a greeting utterance. When an input matches with leave-taking words like 329 "bye", it returns leave-taking words and ends the dialog.

330
When no location is matched, the system explains that "(requested item) is not in this 331 shopping mall. I only know about this mall".     as "restaurant," and "coffee". Sometimes, for features and people's activity, they add such 361 terms like "place for" (eat/lunch/play). Some ontology items are adjectives, such as "tired".

362
People sometimes only spoke such adjectives.

364
The above noun is used in "where is" question, such as "where is Kaika-ya (the name of 365 restaurant) ?" 366  "I would like to" sentence:

367
People also use the form of "I would like to" + "verb" + "noun" in requesting sentences, 368 such as "I would like to buy coffee".
369 For all names, nicknames, and requestable properties, we automatically generated grammatical 370 structures for ASR. Further, we added the following grammars. First, some basic verbs like "go" 371 can be used in "I would like to" type sentences but were not included in the ontology (as they by 372 themselves does not represent any specific request), which we manually added (8 verbs). Second, 373 we added filler words, such as "well", "ah", that appear in advance to questions (12 words). Reviewing Manuscript

377
The ASR outputs the matched names, nicknames, or requestable properties, which are used in 378 the dialog manager to determine the answer to be provided. In case the ASR detects the 379 recognition to be less reliable (because the input does not match well with its language model), 380 the dialog manager prompts the user to say again with utterances like "could you repeat please?" 381 The ASR is deactivated while the robot is speaking.

382
We evaluated the system performance using this ASR implementation. We put the robot on a 383 square of the mall (Fig. 3 right) 409 We conducted a preliminary study with the system reported in the previous section. We initially 410 intended to supplement missing data and evaluate its performance. We found the system itself 411 worked well (we will report in section 6.3); however, interaction failed in other parts we did not 412 think about. That is, some visitors responded in an unexpected way. In short, until here, we 413 focused on the 'information' aspect, which we found to be satisfyingly prepared, but we found a 414 problem in 'interaction'.

415
Here, we report two typical cases of failures. From these cases of failures, with a trial-and-416 error approach, we seek the reason why interaction fails and seek for better pattern of interaction 417 for the problem. Finally, we generate hypotheses about missing knowledge in interaction (to be 418 reported in the next section).

426
However, frequently there are people who stay in front of the robot without saying 427 anything. Figure 6 shows one of such cases. A man stopped in front of the robot, and the robot 428 was ready to receive a request, orienting its body and head toward him; but, without talking to it, 429 he moved to a side of the robot, and the robot followed. He moved back, and it followed again. Further, we noticed that the conversation got stuck when it asked for a request, even though 434 the user initially spoke to the robot. For instance, Figure 7 shows a visitor who engaged in 435 greeting, but came to be silent when prompted to ask request. She left after 5 seconds of silence 436 after being prompted.

437
We interpreted that such people do not have concrete requests in their mind, thus they were 438 stuck when asked to offer requests.

461
When a robot is placed on the mall, people sometimes stopped at the robot. We assumed 462 that such people who stopped at the robot as the participants. 497 Figure 8 shows the result of the study. There were 69.0% of the successful interactions in 498 the with self-introduction condition, while 54.4% in the without self-introduction condition.
499 Typical failure was, like the one shown in figure 6, where visitors stayed in front of the robot but 500 remained silent even if they were prompted to talk to the robot. Some visitors left in the middle 501 of the conversation, and some explicitly said they did not need service (6 cases in with self-502 introduction condition).

504
We applied a Chi-square test to evaluate the ratio of success against failures. There is a 505 significant difference between the conditions (χ2(1)=4.755, p<.05, φc=.141).

506
Thus, the prediction 1 was confirmed. When the robot provides self-introduction, the 507 interactions ended with success more frequently. We interpret that even though the robot serves 508 an 'information' role, people should share a common expectation. Unless it explains its role, 509 some people might fail in using it. 530 Thus, we made the following prediction:

531
Prediction 2: If the robot prompts a user for a request in a way of questions they can easily 532 answer, people will more frequently make requests to the information robot.

562
We applied a Chi-square test to evaluate the ratio of success against failures. There is a 563 significant difference between the conditions (χ 2 (1)=5.678, p<.05, φ c =.166).

564
The prediction 2 was consequently confirmed. When the robot's prompting was close-565 ended, the interaction was more frequently successful than open-ended prompting. We interpret 566 that as predicted many visitors did not have requests in mind and got stuck when asked to 567 request; instead, if the robot offered a prompting utterance that invited the user to talk about what 568 they know (e.g. their destination), it will more easily continue the dialog and offer information 569 requested by the user.