1 Introduction

Robotics has experienced considerable growth recently and is expected to boom even further in the near future, with estimates of the personal robots market reaching 15 billion US dollars by 2015 (ABI Research Report on Robot Market Growth 2008, online at: http://www.thinkartificial.org/robotics/robot-market-2015). Furthermore, the application domains of robots have expanded beyond the traditional, such as manufacturing and industrial robotics, to a vast number of new domains, including medical robots, search and rescue, military, educational, home robots, all the way to specialist domains such as robots for the oil industry and demining robots. However, beyond the impressive figures and the new application, in order for facilitate the harmonious introduction of robots to everyday life and across the globe, it is important to study the opinions and attitudes toward them, which might well exhibit considerable variation, across cultures as well as other demographic parameters.

Cultural attitudes toward robots, and especially toward humanoid robots, are a subject that is still in its early infancy. Existing studies are very few in number; and they are focused mainly on populations from the “West” (USA, EU, Mexico) and the “Far East” (Japan, Korea, and China) (Bartneck et al. 2007; Kaplan 2004; Nomura et al. 2007, 2008; MacDorman et al. 2009; Choi et al. 2008; Han et al. 2009), and explore only a small number of demographic parameters of differentiation. It is worth noting, that with the exception of (Riek et al. 2010), the predecessor of the research presented here, there exist no other studies examining Middle Eastern attitudes toward humanoid robots.

Interestingly enough, the Middle East is quite an idiosyncratic place regarding cultural and religious beliefs that might be affecting attitudes toward humanoids: Islamic views regarding depictions of living beings, even more so for the case of statues and three-dimensional representations, though they have sometimes varied through times and places, have been interpreted in order to regulate the use of such images or objects across a variety of domains, ranging from the religious, toward the secular, private, or public. Thus, it is expected that quite possibly, one cannot generalize directly from results acquired from western countries or from the Far East, toward the special case of the Middle East.

For other types of technology apart from Robotics, there is some existing work regarding Arabic cultural attitudes and their potential and actual effect toward technology acceptance. For example, in Straub (Staub et al. 2003) the opinion is expressed that the country of origin of a technology, often determines its basic cultural alignment—and thus, a form of ethnocentricity results, which can create problems with technology acceptance in a different target country. For example, similar problems are reported by Albirini (2006), regarding how EFL teachers in Syria view information technology in the classroom. While most of the teachers viewed computers as being a very useful means to improve Syrian education as well as being important for Syrian schools and society, they felt the computers first were not particularly adapted to Arabic identity and culture, while there would have been advantages if they were. The teachers felt there were many other issues to be addressed before effectively using computers in education, and that although they might prove useful, they were expanding too quickly.

Thus, it seems that it is quite important to understand the cultural and social norms of a country, in order to enable harmonious acceptance of a technology, as Thomas (1987) and Rogers (1995) have also stressed. Also, in case of marked cultural mismatches, unwanted side effects might arise, for example creating resistance by potential users of the emerging technology.

All the above exposition provides strong motivation to understand how Arabic people and other residents of the Middle East might view the prospect of having humanoid robots in their daily lives. Toward that purpose, in Riek et al. (2010) we developed a questionnaire to try to empirically examine attitudes and opinions toward robots in our target region. Originally we had considered administering other previously used questionnaires, such as the NARS or RAQ questionnaires developed by Nomura et al. (2008). However, Syrdal et al. suggest that the internal consistency of NARS may be threatened in cross-cultural/cross-lingual studies (Syrdal et al. 2009), and thus we did not select this option. Furthermore, we wanted an instrument that better captured important areas of Middle Eastern life, such as community, domestic life, and education.

Thus, in Riek et al. (2010) we developed a new questionnaire called the Culture Education and Domestic Attitudes toward Robots (CEDAR) scale. Originally, we administered CEDAR in a one-day pilot study in a public setting where people could see and interact with an android robot to get an idea about its capabilities. Thus, we had brought our robot, Ibn Sina, to a local mall in the United Arab Emirates. The early results of this study are reported in Riek et al. (2010). For the purpose of acquiring a larger sample and more statistical significance, we then administered the questionnaire during a 1-week public demonstration of the Ibn Sina conversational android robot in the highly popular Gitex exhibition in Dubai, results of which we are presenting and discussing in this paper.

Overall, we wanted to know if peoples’ attitudes were influenced by their demographics, including region of origin, college educational level, gender, or age. Our major findings, presented in Sect. 4, include a statistically significant ordering of preferred application areas for robots overall, as well as strong effects of the region of origin on the preferred applications. Furthermore, strong religion, age, and education effects were observed, as we will discuss.

The paper is structured as follows: We start with a background section, discussing relevant religious and cultural aspects of our population. Then, we continue with a description of the Ibn Sina android robot and its conversational system, and with a section discussing methodology. Extensive results follow, together with a discussion, and a concluding section.

2 Background

In this section, we will discuss some important religious and cultural aspects, relevant to the population sample that our questionnaire was delivered to. The main center of gravity is the problem of iconicity in Islam.

As mentioned in the introduction, opinions regarding the problem of iconicity and representational art might slightly differ, as it pertains to depictions of living beings, even more so for the case of statues and three-dimensional representations, in religious as well as secular settings.

When dealing with the attitude of Islam (meaning both the Islamic religion and the numerous and diverse civilizations that have been influenced and built themselves around this religion) toward images one has to make a distinction between the normative attitude prescribed by religious texts and the daily practice in a widespread geographic area and throughout centuries of history. It should also be added that the images that are considered as problematic are those of beings provided with “breath of life” (ruh), i.e., humans and animals. From a normative point of view, the Islamic attitude toward images stems out only marginally from the Koran—God’s word as it was revealed to Prophet Muhammad in the Islamic view—and mainly from the hadiths—words and deeds attributed to the Prophet. From a historical point of view, one has to consider the context in which Islam appeared: the inhabitants of the Arabian peninsula, besides some Jewish and Christian communities, where polytheists, adoring divinities of various types. These gods where represented in the shape of statues, or just symbolized by erected stones (ansab). Islam’s first priority was to defeat polytheism, as it clearly appears in the Koranic text: the sin that God will never forgive is shirk, the association of other deities to the one God. This is the aim of the only verse that is (rarely) advocated as being at the origin of a “prohibition of images”: “Believers, wine and games of chance, idols [ansab] and divining arrows, are abominations devised by Satan. Avoid them, so that you might prosper” (S. 5, 90; transl. N. J. Dawood, Penguin). Images (sura) as such are not mentioned in the Koran, except in relation to the creation of the first man, Adam (S. 82, 8).

The hadiths are more explicit. It should also be stressed that there is no relevant difference between Sunni and Twelver Shiite hadiths.

Hadiths consider images problematic for two main reasons:

  1. (a)

    Images are impure and transfer their impurity to the place where they are to be found. Since purity is a condition for the validity of the Islamic ritual obligations, images cannot exist in places of worship. Images on carpets or cushions are allowed, because no one would have the idea to adore an object on which we walk or sit. Another exception is dolls: their usefulness for girls who learn through them to become caring mothers is superior to the damage they could bring.

  2. (b)

    The second stems out of linguistic considerations: musawwir—the maker of sura(s), images—means painter but is also a term defining, in the Koran, the creative action of God and became one of his 99 names. The painter thus creates a world that God has not created, trying to compete with him. Therefore—so the hadiths—he will be condemned to Hell with the injunction of breathing life into his creations, an impossible endeavor.

From these themes a majority consensus developed among religious scholars in classical times: images have to be banned from ritual practice, because of their impurity and out of fear of return to polytheism.

Nevertheless, figurative images were to be found in Islam, and—with a few exceptions—present all over the Islamic world in objects, house decorations, manuscripts, as a profane practice. Religious art expressed itself through calligraphy and ornament.

Images were costly and relatively rare until the nineteenth century, when the adoption of new techniques, like the printing press, photography and, later on, cinema, television, and internet, resulted in a “multiplication of images” that transformed the daily environment. Islamic scholars had to react and position themselves in regard to this “wave of iconicity,” as Western style painting, public monuments, photography, and movies became part of the lives of ordinary Muslims.

Although positions might vary, and depend on circumstances, there is a general consensus that:

  • Images are allowed when they are useful for education purposes and technological advancement, i.e., in teaching and other fields.

  • Public three-dimensional monuments of humans (or animals) are not lawful. In spite of this consideration, they do exist in many countries.

  • Painting is accepted under the condition that it would not represent unlawful subjects like nudes. Sometimes it is rejected as a practice of the wealthy.

  • Photography is generally admitted, it is considered that it does only reproduce God’s creation like the mirror that reflects an image. It is not—so the general idea—a new creation. The same idea applies to cinema and TV.

    Now, having discussed an important part of the cultural fabric that underlies our context, we will proceed with the system description of the conversational robot.

3 System description

The robot used in our experiment was made to resemble Ibn Sina, a well-respected Islamic philosopher, doctor, and polymath who lived from 980 to 1037 A.D. The robot is a central part of the Ibn Sina Theatre, an innovative augmented reality theater installation, with intelligent robotic and virtual characters (Mavridis and Hanson 2009), where supporting technologies for teleparticipation are also explored (Mavridis and et al. 2011).

Ibn Sina robot has 19 degrees of freedom in its face, and each degree of freedom is intended to represent the human musculature. Its facial movements and expressions are very life-like and natural. The robot also has two degrees of freedom in its arms and is able to move them up and down. Figure 1 shows a person interacting with Ibn Sina.

Fig. 1
figure 1

A man giving the robot a traditional greeting

We built an end-to-end system that allowed people to talk to Ibn Sina in Arabic and have the robot generate an appropriate response. The system consisted of three main components: Speech Recognition, Corpus Searching, and Robot Expression Generation. (See Fig. 2) In addition to the system described in this paper, other cognitive engines are currently being developed and transferred to the robot from other embodiments. These include the FaceBots engine (Mavridis et al. 2009), which utilizes online social information for more effective dialogs, and the Grounded-situation-models engine (Mavridis and Roy 2006), which enables situated language capabilities with reference to objects and events the robot is perceiving or is expecting after having heard descriptions of them.

Fig. 2
figure 2

System overview: First a person speaks to the robot, next their speech is recognized, then query terms are extracted, the corpus is queried, and then the robot is animated to show facial expressions and lip movement

3.1 Speech recognition

The speech recognition component of our system is based on the Acapela speech recognition engine (AcapelaGroup 2011). To facilitate speech with the robot, we modeled a number of different sentences that are common to daily life. Figure 3 shows a subset of the Arabic sentences our system can recognize, as well as the corresponding phoneme realization of the Arabic speech. Phoneme realization was necessary because Acapela is based on phoneme-based speech recognition technique. These phoneme realizations were generated by a fluent Arabic speaker, using the Lexical Editor tool provided with the Acapela system.

Fig. 3
figure 3

A subset of the Arabic phrases our system can recognize and their phonemic realizations

The Acapela engine utilizes both an acoustic model as well as a language model. The acoustic model contains statistical representations of the sounds that make up each acoustic unit, while the language model contains the probabilities of sequences of words. Acapela provides options for two modes of recognition: isolated words and continuous speech. In our speech recognition system, we modeled the system to recognize continuous speech.

At the Language model layer, we developed an artificial grammar that restricts the recognition to a list of sentences that were modeled in grammar. Therefore, the effective speech recognition accuracy for our task was significantly improved, as compared to the accuracy that would have been obtained by having used a generic grammar.

3.2 Ibn Sina corpus and search

In order to help facilitate a meaningful dialog, as well as to give Ibn Sina a bit of personality, we developed a corpus of phrases for Ibn Sina to say. All the phrases in the Ibn Sina Corpus (IBC) were written in the first person, and contained standard greetings (e.g., “Nice to meet you”), interesting anecdotes about Ibn Sina’s life (e.g., “I developed the physics equations that Newton used when developing his laws of motion.”), as well as a few humorous phrases (e.g., “I’m glad you learned something from me, I suggest you go read my book too.”).

All items in the IBC were encoded as UTF-8 text files. Further, we added related keyword synonyms to each file in order to ensure an appropriate response would be generated by the robot when someone spoke to it. For example, if someone asked “What is your name?”, we have a file that contained keyword phrases such as “Name,” “Who are you,” etc.

In order to find phrases in the IBC, we used Google Desktop to index the directory containing all the text files, and wrote a C++ wrapper to send query terms and retrieve documents from it. Each phrase in the IBC was also converted into an MP3 file that contained a text-to-speech reading of the phrase. We used the Acapela Arabic text-to-speech engine to do this.

3.3 Robot expression generation

Finally, after a user’s speech had been recognized and an appropriate response was found in the IBC, we animated Ibn Sina to speak the phrase, move its lips in synchronization, and make appropriate facial expressions. For example, smiling while telling a joke, looking concerned when giving advice, etc. The robot’s animations were done manually using Brookshire Software’s Visual Show Automation (BrookshireSoftware 2011).

4 Methodology

The implementation of the survey took place at the Gitex Exhibition in Dubai, United Arab Emirates. Our choice of location was mainly driven by the fact that it attracts people from a wide variety of cultures, income levels, job classes as well as backgrounds and thus it can, to a certain extent, guarantee a level of variation in the sample that could be thought of as representative of the non-worker class population in the United Arab Emirates as well as the wider Gulf Region. Thus, although we are not claiming that the Gitex sample is representative of the whole population of the UAE or the Gulf, it nevertheless provides, as we shall argue, an interesting sampling of people with many degrees of variation, which either chose to attend this highly popular exhibition or were working in technical, administrative, or labor positions in the exhibition. The popularity of the exhibition is such that people from many walks of life attend, as we shall see, while elaborating on the specifics of our population sample and representativeness in the next section.

Candidate participants of the survey were asked to participate by talking to Ibn Sina and completing the questionnaire shown in Fig. 4. The Ibn Sina robot was set up in the exhibition kiosk of Acapela, where the space is open and has numerous visitors. The robot was at the exhibition for 10 h a day, from 8 am until 6 for 4 weekdays, Monday through Thursday. Regarding the decision of displaying a live demo of the robot with which humans can interact, we felt that first of all, although this might have an effect in the results, it provided an important experience of existing humanoids to our subjects, giving them a tangible view of possible futures.

Fig. 4
figure 4

The 11 main questions of the questionnaire

The choice of robot was not random; we consciously chose to use an Arabic-speaking humanoid, with an appearance which the people of the Middle East consider as “their own” or at least highly familiar, and furthermore with a name which points to one of the most well-known historical representatives of scientific achievements in the region—which again the people of the Middle East and south-central feel as very close to themselves. We chose this robot in order to counteract possible sentiments of “cultural alienation” and to provide a comparable “cultural customization” to the robots that are usually exhibited in other places: considering, for example the usually Japanese esthetic and character choices for the case of Japanese robots, etc. Of course, the question of what bias might have been introduced in our results as an effect of our choice of robot is a very interesting one, which we aim to explore in future work, where we intend to address the question of cultural preferences for appearance and behavior of robots. Notice, however, that the cultural customization that we performed by using the Ibn Sina robot was an important step toward providing a setting that matched the inherent cultural customization of robots used for example in studies in the Far East.

Anyone who came to see the robot or talk to the experimenters was politely offered to complete the questionnaire in English or Arabic, as they choose.

The main organizational axis for deriving results was the formulation of a set of questions, which could then translate to experimental hypothesis and statistical methods for quantitatively evaluating them. The questions posed were:

  • (Qu1) Is there a preference ordering or partial ordering regarding application areas/estimated emotions of peers/educational applications of humanoid robots?

  • (Qu2) Are there significant differences between different demographic groups when it comes to answers to the questions posed?

  • (Qu3) What are meaningful alternative reclusterings of the demographic groups when it comes to their categories?

  • (Qu4) Are there strong predictivity patterns between answers to questions for specific demographic groups?

5 Results

5.1 Statistical test choice

With regard to the first question of interest (Qu1) examining the existence of preference ordering or partial ordering on the different application areas involved in the survey, we chose to start experimenting with a paired t test, which reveals mean values and confidence intervals for every question, which in turn can be compared in order to infer any preference ordering.

For the purposes of our statistical analysis and due to the characteristics of our sample, in order to decide the most suitable test statistic for each question of interest regarding the effect of demographics, we first employ normality tests on all different subgroups involved in each of the survey questions. The normality test is necessary in order to determine whether the standard ANOVA tests can give statistically robust results or whether, in the case where the null hypothesis of normality in our sample is rejected, one needs to adopt non-parametric statistics, able to handle such samples and provide results.

In order to investigate the magnitude, if any, of demographics on individual attitudes toward robots (Qu2), we use one-way ANOVA for the cases where the sample sub-categories pass the normality test, and Kruskal–Wallis non-parametric approach when we reject the null of normally distributed populations. Table 1 provides normality testing results for the survey questions.

Table 1 Normality test for each question (Ryan-Joiner)

5.2 Demographics

Three hundred and fifty-five individuals completed the survey. The demographics are presented graphically in the Appendix. In more detail, 263 respondents were men, and 92 were women. 212 respondents chose to complete the survey in Arabic, and 143 in English. Out of the 355, 290 respondents had a college degree.

Respondents’ ages (Fig. 5) ranged from 13 to 60 years old, the mean age of respondents was 30.12 years old (SD = 9.3). This age distribution corresponds quite well with the overall age distribution in UAE (median age 30.1). Respondents came from a wide range of countries around the world. In particular, the detailed breakdown is the following: UAE: 80, Oman: 36, Saudi Arabia: 33, Jordan: 27, Iran: 26, India: 25, Palestine: 19, Egypt: 15, Syria: 12, Lebanon: 8, Sudan: 8, UK: 7, Pakistan: 6, China: 6, Canada: 5, Yemen: 5, Kuwait: 3, France: 3, Bahrain: 3, Algeria: 2, USA: 2, Libya: 2, Iraq: 2, Russia: 2, Morocco: 2, Philippines: 2, Australia: 1, Belgium: 1, Bolivia: 1, Chile: 1, Ethiopia: 1, Georgia: 1, Ireland: 1, Nepal: 1, New Zealand: 1, Tunisia: 1, Bangladesh: 1, Qatar: 1, Romania: 1, Nepal: 1. 307 respondents were Muslim (86.47%); the remaining respondents were Orthodox: 1, Catholic: 5, Christian: 11, Hindu: 14, Jew: 1, Roman Catholic: 3, Self-identified as “Chinese Religion”: 5 (corresponding to Confucianism, etc.), and 7 did not identify their religion.

Fig. 5
figure 5

Age distribution of respondents

The questionnaire also asked “Did you talk one-on-one with Ibn Sina robot today? Y/N”; 267 respondents replied “Yes,” and 88 replied “No.”

One important design choice at this stage was giving an answer to the question: how should the demographic categories be concatenated in order to make larger meaningful groups? The concatenation of our sample for the demographic categories was done in the following way: Nationality and Age, the breakdown was not straightforward and self-evident as with the rest of the demographic characteristics (i.e., gender, college education, religion). The number of nationalities and different ages was so large that we deemed necessary to group together some of them to allow for more observations in each group and for results that would be easier to interpret.

For the breakdown of the age categories, we implemented two different groupings: (a) individuals under/over 30 years of age, because 30 was the average age in our sample, and (b) 5-year breakdown categories (i.e., 15–19, 20–24, 25–29 years old, etc.). Although there is no general consensus regarding our adopted break points, we feel that the first categorization can give some general result regarding the younger and the older portion of our population, whereas the second grouping could offer some more detailed results.

The variety of nationalities in our sample is concatenated based upon the regions they belong. Due to the fact that several nationalities are very weakly represented in the sample, a broader categorization was judged necessary in order for our results to be statistically significant. The regions chosen are: Gulf, Sham, Africa, Southeast Asia, and Europe and Americas.

Following the results of Table 1, on the hypothesis testing regarding the distribution of our data and given the categorization of each demographic characteristic, in order to decide which test statistic is more appropriate to tackle the research questions under investigation, we had to run normality test on each and every subcategory separately (Table 2). When there are only two subcategories involved in a question and they are normally distributed, the ANOVA test is used. In all other cases the Kruskal–Wallis test is more appropriate, either to handle non-normally distributed data or more than two subcategories involved. Note: no normality test was ran for nationalities represented by less than 4 individuals total in the sample. The authors deem it unnecessary to run normality test on a population of 4 or less observations.

Table 2 Normality tests for each category and question

5.3 Quantitative results

(Qu1) Preference and/or partial ordering

The following box-plot diagram (Fig. 6) presents the mean and variance of the first four questions of the survey, according to which the preference ordering appears to be: Q2 > Q3 > Q1 > Q4, and “>” stands for “preferred to.”

Fig. 6
figure 6

Preference boxplot of application areas (Q1–Q4)

However, this diagram is not able to directly show whether any of the above ordering relationships are statistically significant. For this reason, we perform paired t tests. Here the paired t test is appropriate since the sample including the first four questions of the questionnaire, presented below, passes the normality criterion. The partial ordering which results from the paired t test is shown in Table 3.

Table 3 Partial orderings of the results relating to the use of robots in domestic life

The results presented in Table 3 show some clear and statistically significant preference ordering when it comes to the individual attitudes against using robots in domestic life. In detail, we observe the following: all ordering results are statistically significant. In particular, out of the four different uses of robots in domestic life, individuals like least the idea of having their child being instructed by a robot. On average, they slightly disagree with such a prospect. On the other hand, the idea of receiving help by a robot in house work activities seems to attract the most positive responses, when compared to the other three activities (hospital, work, and school). On average, the responses toward the statement “I wouldn’t mind if a human-like robot cleaned my house” are slightly above 3, which identifies the individuals to be located between slightly and strongly agreeing with such a prospect. For the other two domestic activities, individuals appear to be more open to the idea of having a human-like robot in their work place (Ave = 2.91 ~ Slightly Agree) as compared to being treated by a human-like robot at the hospital (Ave = 2.29 ~ Slightly Disagree). The resulting preference ordering agrees with the one stemming from Graph 1: Q2 > Q3 > Q1 > Q4.

The survey respondents appear to favor more the involvement of human-like robots in routine everyday tasks, which do not particularly involve social interaction, i.e., help in house work. However, a more “human” aspect of interaction becomes a less attractive prospect. In that respect, we see that having robots at their work place would be an acceptable thing; however, the interaction of human-like robots with individuals in hospitals as well as the possibility of having robots instructing children at schools do not appear to be practices favorable to the respondents, in general.

The general conclusion at which one arrives is that individuals have a positive attitude toward receiving assistance by human-like robots at daily routine tasks which do not involve human interaction; however, when it comes to such possibility, i.e., receiving treatment at the hospital or being instructed at school, respondents have a negative response.

  • (Qu2) Are there significant differences between different demographic groups when it comes to answers to the questions posed?

All the results regarding the demographics are summarized in Table 4. Below, we conduct the detailed analysis of those results. Blank cells indicate no statistical significance. Bold letters indicate significance at the 5% level, whereas the rest indicate 10% statistical significance levels. A boxplot for all questions can be found in (Fig. 7).

Table 4 Significant differences to answers due to demographics (2 dp)
Fig. 7
figure 7

Boxplot for all questions

5.3.1 Regional results

Following the grouping of nationalities presented above, we investigate the effect of region of origin on individual attitudes toward several application areas of human-like robots. Our analysis identifies 3 significant results.

The first result points to a regional effect on responses of individuals regarding the statement “I wouldn’t mind if a human-like robot treated me at the hospital.” The non-parametric Wilcoxon Signed Rank Confidence Interval test reveals that individuals from Southeast Asia are the only regional group with a positive attitude toward the prospect of being treated by a human-like robot at the hospital. Respondents from Sham countries appear to have a “neutral” reaction (neither agree nor disagree to that statement). All other groups slightly disagree with the above statement.

We perform two-sample t test on the answers of the different groups of individuals, testing the equality of the sample means. The following results were statistically significant: the scores of people from Southeast Asia and Sham countries are all higher than those of individuals from Africa, the Gulf countries, and Europe/Americas.

The second result, which turns out to be statistically significant, is the respondent’s reaction toward the possibility of having their children being instructed by a human-like robot. This result is significant at the 10% level, and the break down of the results is shown in Table 7. Once again the respondents from Southeast Asian countries have the most positive attitude toward having their children being instructed by robots at school. However, their median score is 2.50, which is neither positive nor negative. For all the other regional groups the answers are negative, with median scores being 2.00 or lower, which corresponds to “slight disagreement.”

The only statistically significant ordering result in Question 4 is that respondents from Europe and the Americas have the lowest scores compared to all other groups in their answers.

Third, for the statement “Children will enjoy learning from a robot like Ibn Sina,” we found a P value = 0.023, thus underlying statistically significant differences across regions of origin. Our results, presented in detail in Table 8 (appendix), show that people from Europe and the Americas agree more than all other regional groups (Gulf, Sham, and Asia), and in particular, in absolute terms, they seem to strongly agree with this statement. Individuals from Sham countries come second in the “agreement” ranking in this question. The general conclusion from this question is that all groups either slightly or strongly agree. One thing that we need to note here is the big difference in the responses of individuals from Europe and the Americas between Question 4 and Question 11, which is between their own attitude toward having their children being instructed by a human-like robot and their children expected attitude toward learning from a human-like robot. Clearly here, the respondents identify the potential impact that such a practice would have on their children, without necessarily agreeing with it.

With regard to this question, our two-sample t tests reveal that individuals from Europe and the Americas agree more than those from Sham, Southeast Asia, and Africa.

5.3.2 Age

First concatenation: 30 years old

We have tested whether age has a statistically significant effect on respondents’ attitudes toward Ibn Sina robot. We have separated the respondents in two categories: individuals under 30 years old and individuals equal or more than 30. We find a statistically significant result regarding the statement “I wouldn’t mind if a human-like robot cleaned my house”; the P value = 0.054. In absolute terms, both age categories slightly agree. For individuals over 30 years old, the mean of the score is 3.13 and for individuals under 30 the average is 2.88. However, the median values differ slightly, with the youngest group being more prone to the idea of having human-like robots doing routine housework. The non-parametric Wilcoxon Signed Rank Confidence Interval test results are shown in the appendix.

Second concatenation: 5-year breaks

For the second concatenation of age groups, we have created 5-year breaks in the population. Due to the small number of observations in the two tales of our sample (youngest and oldest), we have grouped together all individuals under 19 years and all people above 50. Regarding the possibility of being treated by a human-like robot at the hospital, a general result stemming from Table 10 (in the appendix) is that the youngest and oldest age group together with the individuals aged 35–44 are slightly opposed whereas the other age groups seem to be more indifferent toward such a possibility.

With regard to the potential of having their children being instructed by a human-like robot, we see that the age group 44–49 is the only one that clearly agrees with that statement, having a median score of 3.00. We could call all the other age categories as “neutral” to that statement with medians being between 2.00 and 2.50.

The last two results regarding the effect of age on individual attitudes toward human-like robots appear to be very similar. In particular, the statement “Elderly people would learn a lot from robots like Ibn Sina” has received for the most part responses with a median of 2.50, with no much variation across age groups.

A similar picture holds with question number 10 “It would be easier to learn history from a robot like Ibn Sina than from a textbook.” In particular, most responses average between 2.5 and 3 without much variation. One can conclude that the response is overall neutral or slightly positive.

5.3.3 College education

Discrepancies in education level appear to have an impact on the attitudes toward the possibility of receiving treatment from a human-like robot at the hospital. The resulting P value is 0.035, identifying this result as statistically significant. Individuals with college degree agree more than the people without, in using robots in hospitals.

In particular, the non-parametric Wilcoxon Signed Rank Confidence Interval test points to the following facts: respondents with college education appear to be on average indifferent with the prospect of receiving treatment from a human-like robot at the hospital. In detail, as presented in Table 14 (appendix), the median value of their responses is 2.5, which is characterized as indifferent since the value is exactly between the scores of slight agreement and slight disagreement. On the other hand, individuals without college education clearly disagree with that prospect, with a median value of 2.00 (slightly disagree) and a 95% confidence interval between slight and strong disagreement scale.

5.3.4 Gender

We have tested whether gender influences the score of respondents. Since the data are not normally distributed, we ran Kruskal–Wallis test and found 2 statistically significant results.

First, in the statement “Many people from my home country would feel happy if they saw Ibn Sina robot” the resulting P value is 0.002 indicating a statistically significant difference between men and women. In particular, the results indicate that women agree more than the men that people will feel happy if they saw Ibn Sina. In absolute terms, the results are presented in Table 16 in the appendix. Both genders agree with the statement; however, women seem to agree more strongly.

Second, in the question regarding children attitudes toward learning from a robot like Ibn Sina, we also locate a statistically significant result. The gender effect appears to be equivalent to the one discussed above, both in sign and in magnitude.

5.3.5 Conversation with Ibn Sina

The test statistics have identified two results regarding the effect of speaking to Ibn Sina on individual responses.

First, the question that asked “I wouldn’t mind if a human-like robot cleaned my house” returns a P value = 0.021 (which is below the critical value 0.05). According to the non-parametric Wilcoxon Signed Rank Confidence Interval test, the people who talked with Ibn Sina agree more than people who did not. Both categories agree with the statement; however, people who held a conversation with Ibn Sina agree more.

Interestingly enough, the second result appears to be somewhat opposing to the finding above. In particular, we find that individuals who held a conversation with Ibn Sina robot think that people from their home country would feel angry if they saw Ibn Sina. Absolute results of the non-parametric Wilcoxon Signed Rank Conf. Interval test are in Table 18, which can be found in the appendix.

Following the above result, we also find that individual who did talk to Ibn Sina robot believe that people from their home country would feel afraid if they saw Ibn Sina robot.

5.3.6 Religion

Religion appears to have a statistically significant effect on two application areas of the human-like robots, delivering treatment in hospitals and instructing children at school. For both areas of application, Hindu people have the most positive attitude with a median of 3.00 and 2.50, respectively. Regarding the use of human-like robots in hospitals, individuals of religions which are predominant in China (Confucianism, Buddhism, Atheism, etc. self-marked as “Chinese”) also have a slightly positive attitude, whereas Muslim and Christian respondents are indifferent and negative, respectively.

With regard to using human-like robots to instruct children in schools, all respondents except Hindu gave negative answers. Similar to the result presented above, we find that Hindus are in favor of having elderly people learning from human-like robots like Ibn Sina. All other religions appear to have more conservative attitudes toward this statement.

Muslim, Christian, and Hindu people believe that it would be easier for children to learn history from a robot like Ibn Sina than from a textbook. In that question only respondents of Chinese religion seem to slightly disagree. Wilcoxon Signed Rank Confidence Interval results are shown in the appendix. The last result regarding the expected attitudes of children toward learning from Ibn Sina shows that all Christian, Muslim, and Hindu respondents slightly or strongly agree. However, individuals from Chinese religion background disagree with that statement.

5.3.7 Language of the survey

The language of the survey does not appear to have any statistically significant effect on the survey answers.

  • (Qu3) What are meaningful alternative reclusterings of the demographic groups when it comes to their categories?

Regarding the third question of interest to our research, we ask whether there are other meaningful reclusterings for the demographic groups. Let us first start by explicating the general requirements that drove us to the chosen cluster choices. When clustering, we needed to strike a balance between: (a) a large number of clusters which would enable us to detect potential differences in responses between various groups, and (b) having clusters with a large enough size in order to be able to reach statistical significance. Thus, (a) and (b) are antagonistic, and thus require a suitable tradeoff. Furthermore, (c) we needed to choose demographic clusters that would exhibit a certain level of homogeneity when it comes to their responses to robots—in order to derive maximize the detected results. These were the three criteria that were used, in order to derive the demographic clusters that were chosen. Details are provided below.

Regarding the categories gender, had a conversation with Ibn Sina, religion, have college education, survey language, we believe that the clusterings are obvious, i.e., men/women, yes/no, etc. Therefore, only two clusterings could potentially be questionable, given than other reclusterings could potentially alter the findings. The first is age. It is obvious that each age could not by itself form an age category, since that would lead to more than 20 groups and in that case each group would have very few observations. Different clustering could also be proposed for age, i.e., 3-, 10-, 15-year breaks, etc. However, intuitively, we feel that the two clusterings chosen can reflect age differences in attitudes without being too narrow or too broad. Second, regarding the clustering of nationalities, again if each nationality is grouped individually then there are several categories with very low representation in the sample (one or two observations). Thus, a meaningful clustering of nationalities would be one that would encounter some common characteristics of different nations. So, after initial experimentation, we devised the clusterings that were used. Geographic, cultural, religious, and other reasons hide behind our clustering of nationalities into regions. Other reclusterings are of course possible; however, our initial experimentation illustrated that the chosen clustering provided adequate satisficing of the criteria that were posed.

  • (Qu4) Are there strong predictivity patterns between answers to questions for specific demographic groups?

The mathematical criterion chosen was symmetric uncertainty:

$$ {\text{U}}({\text{X,Y}}) = {{2\,{\text{I}}({\text{X}};{\text{Y}})} \mathord{\left/ {\vphantom {{2\,{\text{I}}({\text{X}};{\text{Y}})} {({\text{H}}({\text{X}}) + {\text{H}}({\text{Y}}))}}} \right. \kern-\nulldelimiterspace} {({\text{H}}({\text{X}}) + {\text{H}}({\text{Y}}))}} $$

After thresholding with 0.1 in order to detect considerable mutual predictivity, it was found that answers to the following pairs of questions were heavily mutually predictive: (Q5, Q6), (Q7, Q8), (Q10, Q11), i.e.,:

  1. 1.

    Estimates of feelings of Happiness were mutually predictive of estimates of feelings of being Comfortable

  2. 2.

    Estimates of feelings of Fear were mutually predictive of estimates of feelings of being Angry

  3. 3.

    Agreement that people would enjoy learning about history through the robot was mutually predictive with agreement that children would enjoy learning through the robot.

6 Discussion

To start this discussion, let us try to concisely revisit the results obtained. In short, they are:

(Qu1) Is there a preference ordering or partial ordering regarding application areas/estimated emotions of peers/educational applications of humanoid robots?

A1) Application Area Preference Ordering (Tables 3, 5)

Table 5 Main statistically significant dependencies

Q2 > Q3 > Q1 > Q4, with statistical significance, i.e.,:

  • HouseClean Robot (Ave > 3) >

  • > WorkPlace Robot (Ave = 2.91) >

  • > Hospital Robot (Ave = 2.29) >

  • > Child Instructor Robot (Ave = 2.16)

(Qu2) Are there significant differences between different demographic groups when it comes to answers to the questions posed?

Our findings toward this question are summarized below:

(Qu3) What are meaningful alternative reclusterings of the demographic groups when it comes to their categories?

Following the argument of the previous section, no important alternative reclusterings were noted, apart from those that were used in this study.

(Qu4) Are there strong predictivity patterns between answers to questions for specific demographic groups?

As noted in the previous section there is strong mutual predictivity of (Q5, Q6), (Q7, Q8), and (Q10, Q11), according to the symmetric uncertainty criterion that was chosen.

Now, having summarized the main findings, let us ask:

(Qu5) What are possible explanations behind our observations?

Application area ordering: People seem to be more positive toward accepting robots in application areas such as housecleaning and workplace robotics, where there is less emphasis on human interaction and critical high-level knowledge, as compared to having robots in the hospital (where there is the perception of direct potential danger to human life through a mistake, or there is the perceived need of a more “human touch”), and having robots instructing children (where there is the requirement for a trusted tutor).

Womens more positive than mens overall: There is the possibility that the strongly perceived male gender of Ibn Sina might create a cross-gender bias (women liking male robots more, and vice versa). Furthermore, as many of the questions in the questionnaire have to do with household assistance, which usually women are more implicated in, especially in the Middle East, the possibility of offloading this work to a robot might be more appealing for them.

Regional effects: Regional effects were observed for the application areas of hospital and child instruction, as well as for the perceived enjoyment of children learning from robots like Ibn Sina. The overall pattern for Q1 and Q4 (application areas) is that respondents of Southeast Asian origin are more positive; and this effect is also in concordance with the effect of religion for these two questions. Thus, one possibility is that the cultural and religious substrate of Southeast Asian citizens, and especially Hindus, might be accounting for this difference. Furthermore, it might be the case that the high population density of places such as urban India, coupled with service shortages, might also contribute toward this positivity. Interestingly enough, regarding perceived children enjoyment while learning from robots like Ibn Sina, possibly due to the nature of this question (which deals with estimating emotions of children, in contrast to application area questions Q1–Q4), Europeans, Americans, and Shamis were more positive.

Age effects: The most marked effects that were observed have to do with the application area of house cleaning (people less than 30 are much more positive about robots helping them!), and elderly learning (where again, people aged between 20 and 24 were more positive). A possible explanation for the housecleaning bias, might be that young people are less used to such chores, and their lifestyle is not really matched with enjoying such tasks. Yet a possible explanation for the elderly bias might be that the age group between 20 and 24 (age band of undergraduate studies) is usually thought of as most accepting of novelties—while they are also old enough to appreciate the benefits of lifelong learning for the elderly, and while they would possibly like to see their grandparents become more technology savvy.

College education effects: The only statistically significant result had to with robots applied in hospitals. People without a college education are in slight disagreement with this prospect, while people with a college education are neutral. Possibly, this might be caused by the fact that their education might inhibit possible inherent fears, and give them a more empirical-proof-oriented attitude toward application of new technologies in sensitive domains.

Mutual predictivity between pairs of questions: The first pair (Q5, Q6) corresponds to an underlying “positive valence” substrate for the “happy” and “comfortable” affective states, the second (Q7, Q8) to a “negative valence” substratum for “angry” and “afraid,” while the third (Q10 “History Learn” and Q11 “Children Enjoy”) could be reflecting the strong underlying association between the activity of learning and children/childhood as the most probable participants and age for the activity.

Religion effects: Religion has statistically significant effects in the biggest number of questions (five), as compared to the other demographic categorical axis. These five questions were: once again the hospital and child instruction application areas (Q1 and Q4), as well as the three learning-related questions (elderly Q9, history Q10, children enjoy learning through robots Q11). The general observation is that Hindus seem to be more positive, followed by Muslims, while Christians often get a conservative attitude, especially when it comes to applications of robots in everyday life. However, this pattern does not always hold. These results are generally in overall agreement with the results on region, since there is partial concordance between regions and religious beliefs. Details are provided below.

First, regarding the two application areas (hospital and child instruction), Hindus are either slightly in agreement or neutral, while Muslims are neutral or slightly in disagreement, and Christians are slightly in disagreement or semi-strongly in disagreement. This could be explained by the decreased sensitivity of Hinduism to the uniqueness of humans as entities populating the universe, while Islam adopts a more human-centered view but with the possibility of overriding of hard rules if necessity toward the common good dictates so. Christianity, on the other hand, is usually coupled with a more cautious attitude toward the introduction of machines and the possible perceived replacement of humans.

Second, regarding robots and learning (Q9, Q10, Q11), the observed pattern was that especially when it comes to learning about history as well as children learning from robots (Q10, 11) Muslim respondents were quite enthusiastic, with a median standing halfway between strong and slight agreement, and in general more enthusiastic than the other groups. This can be partially explained by the fact that the particular robot in our study was associated with Ibn Sina, which is a historical figure exemplifying achievements of Islamic Civilization. Let us now look at the above results from a higher viewpoint, and try to speculate on possible explanations:

People from a Muslim background answering to the survey seem to have a median position between the largely positively reacting Hindus and the more critical Christians. This can be explained by a cultural attitude that since the nineteenth century sees in technology the means of development and progress and which is strongly present nowadays in the fastly developing Gulf states. Therefore, technological developments are—with few exceptions—viewed as advancement, in contrast to the often technology-critical attitude, which arose in many Western countries since a few decades (and which could explain the rather negative reactions of Christians in the survey). If technology is generally perceived as value neutral, this is not the case with some of the changes in life style that it implies. This could explain the relatively cautious attitude of Muslims regarding health care and children education, since the replacement of humans by robots would mean changes in family roles and social relations between age groups and genders. But could some of the reactions be brought back to the Islamic attitude toward the representation of human beings? This is of course difficult to state, since we have no precise data analyzing the question. However, since the nineteenth century Islamic scholars have stressed the notion of utility of images for educational purposes and technical development, and such opinions have spread throughout the Muslim world, constituting a widely shared belief. The medium-positive responses to the Ibn Sina robot seem to confirm this. Therefore, the relative reluctance that has been expressed toward humanoid robots in the present survey should be understood rather out of the sociological issues pointed at above than from a theoretical point of view related to the question of images and representations.

7 Conclusion

In order to enable effective international customization of robot designs, given the predicted globalized increase of the role of robotics in our everyday life, and in order to facilitate their smoother harmonious introduction to everyday life, it is important to study the opinions and attitudes toward robots in different regions of the world. Although there exists a small body of research covering the US, EU, and Asia, prior to this paper, there was almost no research regarding attitudes toward robots in the Middle East, a region with its own marked cultural idiosyncrasies. Therefore, we brought Ibn Sina, an Arabic-language conversational android robot to Dubai’s Gitex, one of the most important exhibitions in the region, and performed a questionnaire-based empirical study with 355 subjects from 38 countries, which had seen the robot interacting, and most of which also interacted directly with it.

Many interesting findings were presented: First, a statistically significant ordering of preferred application areas for robots overall was found, as well as strong effects of the region of origin on the preferred applications. Our result presented in Table 3, showing higher preference for the use of robots in housework is in strong agreement with the findings in Han et al. (2009), which indicate that cleaning robots have highest marketability among individual-service robots.

The possibility of having education be delivered by robots in classrooms appears to be an issue that raises sound disagreement. Our findings show that our respondents in the Middle East are negative with such a prospect, as is the case in Han et al. (2009) with Japanese and Spanish parents who prefer education to be provided by humans. In addition, Table 7 shows that respondents from EU and the Americas have the strongest disagreement regarding the aforementioned statement, while respondents from Southeast Asia are neutral, and gulf respondents have week disagreement. This finding of non-positive attitudes toward robots educating our children is also in accord with existing work; (Choi et al. 2008; Han et al. 2009) for the EU and (Nomura et al. 2007) for the Americas, and so our work extends and refines previous findings, giving them a more global coverage and also exposing the local differentiations.

However, respondents seem to favor the idea that children would enjoy learning from a robot like Ibn Sina in general, in accordance with Bartneck et al. (2007). Notice though that this has to do with robots being used as an educational tool in addition to classical human tuition in the classroom—and of course, people are not positive when it comes to robots being the primary dedicated instructors of kids in school; as confirmed by our findings and previous work, and as commented above.

Strong religion and age effects were observed. Religion and age were also reported as significant determinants of attitudes toward robots in MacDorman et al. (2009); however, in that paper, religion effects were only speculated and not empirically confirmed. The gender effect on responses that we observed is also in accord with Bartneck et al. (2007), where female survey participants were more positive than their male counterparts. The positive effect of the experimental subject having interacted with the robot on the attitudes toward it, which we found for the case of engaging robots in housework, is additionally confirmed by Bartneck et al. (2007).

Finally, our finding that respondents from Southeast Asia agree with having robots in hospitals whereas European/American respondents disagree is in line with the finding in Nomura et al. (2007) where US students are found to “tend to more strongly assume that robots are more suited to tasks related to life-and-death situations.” (p. 38). Again, for the case of robots in hospitals, we extend our demographic to the Middle East and Southeast Asia, and we further refine these findings with regional differences.

Overall, the results presented are numerous and multi-faceted; they extend partial results of previous literature toward a wider range of demographic categories (religion, region, etc.), and provide much finer resolution, as well as totally novel findings. In conclusion, the results presented together with the theoretical discussion of possible causes provide interesting insights on cultural acceptance of robots in this richly complex region. Such insights are expected to have strong implications to the wider application of robots in the future in specific settings, and thus might well be highly beneficial toward informing their design and deployment of robots around our globe.