1 Introduction

The use of VUIs in everyday life is ubiquitous. They can be found in a wide variety of applications, such as smart home automation, information requests and navigation in cars. Users benefit from the simple and fast usability and flexible interaction possibilities. VUIs can be used stand-alone in the form of statically positioned smart speakers or installed on (mobile) devices. Statista Research Department estimates the number of voice assistants on various devices to reach 8.4 billion in 2024 (Statista Research Department, 2022). Various commercial (e.g., Amazon Alexa Skills Kit) or open source platforms (e.g., Mycroft AI) offer the ability to develop own applications and provide prebuild functions such as intent-recognition, text-to-speech and speech-to-text engines.

Despite the popularity and acceptance of VUIs in everyday life and the availability of technical frameworks to create custom applications, the use in industrial field of application, especially in manufacturing logistics, is still rare. We use the definition of manufacturing logistics by (Strandhagen et al., 2017), which describe it as “the planning, control and configuration of logistics flow in a manufacturing company”, using “manufacturing” and “production” as interchangeable terms. (Gorecky et al., 2014) highlight the increasing complexity and digitization of industrial production, resulting in a wide range of responsibilities for each worker. This includes tasks like the operation of new machines such as industrial robots, complex maintenance work, automated logistics systems and the training of new employees (Panetto et al., 2019). For data/information exchange, networked machines are using digital components to create cyber-physical systems (CPS). Flexible, intuitive UIs are necessary to access these industrial CPS and to support the worker in the best possible way (Stocker et al., 2014). Nayyar and Kumar (2020) see a VUI as a good fit, as it offers fast and easy operation and a high degree of flexibility.

The aim of this review is to provide an overview of research approaches and applications of VUIs in manufacturing logistics. In doing so, we have formulated three research questions shown in Table 1. In this way, we extract further research areas and research goals to support the establishment of voice-based UIs in industrial domains. For this purpose, we conduct a systematic literature review according to Webster and Watson (2002). In Sect. 2, further information is given on the use and barriers of VUIs in the manufacturing logistics environment. Section 3 presents the research process by describing the search query, used databases and the exclusion criteria. In Sect. 4, we highlight similarities and differences in the literature results and identify and compare task areas, advantages and challenges of VUIs in manufacturing logistics. In Sect. 5 we discuss the results in the context of the research questions and setting future research goals. Section 6 summarizes our findings and provides a conclusion.

Table 1 Research questions

2 Background

In modern, increasingly digitalized manufacturing logistics, people maintain a key role. There is a wide range of non-automated or semi-automated processes, such as order picking, interaction with production machines or the execution of manual production steps. Easy-to-use devices are needed to optimally support staff in their work (Stocker et al., 2014). While recent literature reviews on technical approaches such as Augmented Reality (Karomati Baroroh et al., 2021; Zigart & Schlund, 2020) and Motion Capturing (Menolotto et al., 2020) for industrial use are already available, a review on VUIs in manufacturing logistics is missing, although VUIs for industrial use are considered promising (Nayyar & Kumar, 2020).

VUIs have been used successfully in order picking for decades in the form of Pick-by-Voice systems (Miller, 2004). But regardless of the widespread use in everyday life, other practical applications in manufacturing logistics are almost non-existent. This is because the application requirements differ between “everyday use” and industrial environments. Rogowski (2013b), Wellsandt et al. (2020a) and Wellsandt et al. (2021a) give different reasons why VUIs are not widely used in industrial settings today (Table 2). As these are not just technical barriers, we also show knowledge barriers and economic barriers. Knowledge barriers show so far unclear points, especially concerning human–machine interaction and questions of benefits, responsibility in case of accidents and data security. Economic barriers relate to the high-cost risk of system implementation and operation. However, most barriers are directly or indirectly due to technological challenges, such as lack of robustness under industrial conditions.

Table 2 Explicitly mentioned barriers to the use of VUIs in industrial environments

Causal chains of barriers become apparent, as individual barriers are interdependent, as shown in Fig. 1. For example, the problems of the missing robustness in industrial conditions mentioned by (Wellsandt et al., 2020a) lead to the weak reliability of prototypes mentioned by (Wellsandt et al., 2021a), which results in lower acceptance among employees (Wellsandt et al., 2020a) and thus to a general lack of interest (Wellsandt et al., 2021a). It follows that barriers 1, 4, 5 and 7 in particular should be considered, as they are the cause of all the other barriers (Fig. 1).

Fig. 1
figure 1

Cause-and-Effect Diagram for barriers to the use of VUIs in industrial environments

3 Research process and research method

For the systematic literature review, we followed the approach suggested by (Webster & Watson, 2002). At first, we queried for the terms “voice assistant”, “speech assistant” and “manufacturing”, “maintenance”, “logistics” and “production” to get an overview of different industrial application areas for VUIs. After obtaining 7 relevant documents out of 164 query results in the Web of Science Database, we added the term “voice user interface” and removed the “production” search term as it is too general and not necessarily related to the industrial domain. Also, the terms “Operator 4.0” and “Softbot” were added as they appeared in the first search results. The Operator 4.0 describes the cooperative work between a skilled operator and digital, interactive systems to achieve a symbiosis between humans and machines or cyber-physical systems (Romero et al., 2016). As the softbot term encompasses both voice assistants and text-based dialogs, we decided to include text-based chatbots into the consideration as they can be used as voice assistants via voice-to-text and text-to-speech engines. The first query showed that VUIs are often combined with other types of UIs, so the search in Web of Science was conducted across all fields (e.g., Topic, Title, Categories). In this way, publications with the voice assistance as side focus (e.g., as an additional interaction possibility) are also included. The final query is shown here:

((voice AND assistant) OR (speech AND assistant) OR (voice AND user AND interface) OR softbot) AND (manufacturing OR logistics OR maintenance OR (operator AND 4.0))

Figure 2 shows the different steps of the search process where the query is used. We determine exclusion criteria to focus on industrial applications, their specific circumstances, challenges and the benefits of using a VUI.

Fig. 2
figure 2

Search process with exclusion criteria

Step 1 & Step 2: At first, we used the query above and retrieved 149 results. After filtering them by the exclusion criteria, 18 results were found.

Step 3 & Step 4: As suggested by (Webster & Watson, 2002), we collected all cited references (backward search) and articles citing the results (forward search). We removed duplicates and filtered by the exclusion criteria to obtain 28 additional search results.

Step 5: Finally, we used Google Scholar to retrieve publications which are not listed in Web of Science. In addition to many duplicates, we found and added 10 more relevant papers.

4 Literature review results

4.1 Concept matrix by Webster & Watson

We divide the results according to the areas of application investigated and present different criteria in a concept matrix (Tables 3 and 4) proposed by (Webster & Watson, 2002). The first column shows all 56 results, ordered by year of publication. The other columns contain concepts which are extracted and clustered from the literature found:

Table 3 Concept matrix for papers published 1991–2018
Table 4 Concept matrix for papers published 2019–2021

The application areas as first concept serve as the starting point for an overview of already investigated industrial application scenarios for VUIs. Different publications have been distinguished on the basis of seven different areas. Control of robots, machines, transport vehicles describes the control of various devices through voice commands. This includes industrial robots, portable devices, machines and transport vehicles. The set-up of production machines and the control of transport play an important role in the internal flow of materials and in general production planning. Data/Information output includes all applications that provide information about machine and system status and other production-relevant key figures (such as order status, capacity utilization, disturbances). In Maintenance, VUIs are used in the form of interactive assistance to help the employee deal with various equipment failures, malfunctions or routine checks. In Employee Training/Learning-on-the-job VUIs are used for the interactive training of employees. On the one hand, this takes place in special training environments, on the other hand, worker gets the training directly on site at the actual workplace (“Learning-on-the-job”). Data collection includes studies that deal with the recording of information relevant to manufacturing logistics by employees on the shop floor. In Picking/Warehouse management, a VUI is used for (manual) picking tasks. In Quality assurance, products or components are inspected based on various quality parameters.

The concept of the Form of speech input in an industrial environment differs from everyday speech use and depends on the application. We distinguish two types of verbal interaction: Command-based and (Quasi-) natural. Command-based means a fixed vocabulary with a small number of words (e.g., for the verbal navigation though graphical UIs). Natural speech input involves the understanding of various formulations and intents. (Rogowski, 2012) adapts the natural speech input to the application by “some restrictions regarding the voice command structure” and calls it quasi-natural to reduce complexity but maintain intuitive usage. If a publication is not assigned, the publication does not indicate the type of input language used.

Positioning of the speech interaction device plays an important role in the process organization of manufacturing logistics. In particular, walking routes and possible formation of queues (e.g., when several workers want to use a fix positioned VUI device at the same time) must be taken into account when planning the factory layout. Fixed positioning means that the voice interaction device is assigned to a fixed workstation or machine. Mobile VUIs are devices that are carried by the employee (e.g., by using a smartphone with a headset).

The concept of Additional User Interfaces was chosen because it concretizes the application context of the VUI and leads to certain advantages and challenges of the system. A distinction is made between (Depth) cameras, Gesture recognition and Haptic control as pure input devices, the Screen as a pure output medium and Mouse/keyboard (as they are only used in combination with screens), Touch screens and AR glasses as combined input and output devices. The additional user interfaces are either used together with the VUI or represent an alternative form of interaction.

To answer RQ2, we group several explicitly named Advantages of using a VUI. Hands/Eyes-free means that the hands do not have to be used to operate the VUI, the field of vision remains free. Intuitive usage ensures a good usability of the VUI without time-consuming training. High speed of interaction describes the higher speed of use compared to other forms of interaction. Due to the hand/eye-free usability, VUIs can be used by people with visual or physical disabilities as they are Barrier-free accessible. Flexibility means that VUIs can be adapted to a wide range of application areas and tasks. Reduced cognitive load contributes to good usability and prevents mental exhaustion. The advantage of Low cost refers both to the acquisition costs of the hardware and the operating costs of VUIs.

In addition to the advantages, numerous challenges to the use of an industrial VUI were highlighted in different publications. Environmental noise is particularly relevant in the industrial environment, where the use of machinery and tools can result in high levels of noise pollution. Other environmental disturbances are additional external influences that must not affect the functionality of the system, such as dust or humidity. The advantage of the intuitive usage requires the implementation of Good usability and a suitable feedback system. This includes the Adaption to employee as individual, so the VUI should be individually customized to each employee, taking into account factors such as current task, qualification and age. Easy integration into existing tools and systems is of great importance in the industrial context, since complex IT systems and machinery are usually already established and a smooth integration of the VUI is necessary to minimize production losses domains [cf. Brownfield challenge (Schmidt et al., 2018)]. Finally, the Understanding of multiple languages and dialectics is required, as employees of different nationalities sometimes work together.

The research results based on the different concepts are explained in more detail in the following section.

4.2 Concepts

4.2.1 Application

We distinguish seven different types of applications. 29 publications investigate the control of devices via speech data processing. These include industrial robots, portable devices, machines and transport vehicles. In 14 papers a VUI is used to output information about machine states, orders, workload or failures. This is either triggered by the system (e.g., in case of an error occurrence) or in response to questions asked. Interactive speech assistance to support (machine) maintenance is analyzed in 10 cases by providing interactive manuals. Similar to maintenance, nine papers investigate voice-assisted employee training and provide interactive instructions on manufacturing tasks and machine operations. Three papers point to the potential of the voice assistant to record working times or general information on the shop floor. As picking tasks are already coordinated by speech-based systems (see chapter 2), three further papers address the evaluation of existing systems. One paper shows the use of AR technology coupled with voice interaction for quality assurance.

4.2.2 Input language

Command-based input language appears 18 times and is mainly used to control machines, robots and transport vehicles. (Angleraud et al., 2021), for example, use simple predefined commands such as “come”, “pick” or “give” to control a robot arm. (Quasi-)natural language is mainly used for data and information output tasks and appears 23 times. (Gärtler & Schmidt, 2021) perform complex information retrieval queries using natural language input, such as “What is the aggregation of the value for Baden in Aargau in Q3 2019?”

4.2.3 Positioning

Depending on the application, there are different approaches to fixed and mobile positioning. For maintenance and commissioning tasks, mobile use is preferred. During maintenance, interactive instructions can be given directly on site, even for large technical systems and vehicles [e.g., aircraft maintenance (Bohus & Rudnicky, 2005), machine tool maintenance (Longo & Padovano, 2020)]. Fast manual picking tasks also require a portable voice assistant. A total of 31 systems are designed for mobile use. Fixed positioning is applied 18 times. It is mainly used in the control and information retrieval of large, statically positioned machines (especially industrial robot arms). Other applications include speech interaction in dedicated training environments (Wasfy et al., 2004), data collection and on-site training in manufacturing cells (Sim et al., 2006), and quality control (Wellsandt et al., 2020a).

4.2.4 Additional user interfaces

As mentioned before, the VUI is often used in combination with other UIs. 25 publications include at least one additional UI, depending on the application. Different modalities are either combined or offered as alternatives. Alternative interaction with touch displays is provided in nine cases, and screens with mouse and keyboard are used in seven cases. Visual output is used to display available voice commands in (Rogowski, 2010) or technical information (e.g., animations, video streams) for maintenance tasks in (Bohus & Rudnicky, 2005). In five cases a VUI is used to interact with AR glasses. Gesture recognition of hand movements as an additional medium is used in four cases for parallel robotic control, as no hands need to be used to interact with the VUI. Three papers mention a parallel haptic control to control robot arms by moving them by hand [cf. (Gustavsson et al., 2017)]. In two publications, cameras are used for object recognition: For task verification in a learning workplace (Costa et al., 2019) and grasping tasks in robot interaction (Rogowski & Skrobek, 2020).

4.2.5 Advantages

The advantages are the reason why a VUI is used in manufacturing logistics applications in the reviewed publications. The ability to work hand and eyes-free is particularly dominant and is highlighted in 25 publications. Manual manufacturing, picking and maintenance tasks in particular benefit from this. (Fischer et al., 2017) also highlight the advantage that, in contrast to touch screens, the work gloves can be kept on during verbal interaction. Furthermore, the hand and eyes-free control also favors the combination with other UIs. Fifteen papers emphasize the intuitive, natural usability of a VUI. In these publications, as far as indicated, a (quasi-) natural input language is used. The high speed of interaction compared to other types of UIs is emphasized by 13 papers. Miller (2004) mentions in this context the advantages of voice-based picking in cold environments (e.g., cold stores) compared to scanner-based picking, as gloves do not have to be taken off and ice crystals do not have to be removed from the scanner. Five publications highlight the ability of physically impaired employees to use the equipment. In five other papers, the flexibility of VUI is highlighted, in terms of customizable data structures (Fischer et al., 2017; Longo & Padovano, 2020; Longo et al., 2019), the use of the combination of different commands to control robot arms (Liu et al., 2018), and the handling of various maintenance tasks (Zhu et al., 2014). The reduction of cognitive load is addressed by three papers, two papers point out the generally low cost of integrating a VUI into a production system. In summary, all advantages explicitly and implicitly lead to cost savings through faster and higher quality work without much training effort.

4.2.6 Challenges

In comparison to everyday use, the industrial environment brings different requirements for the use of speech assistance. The biggest challenge is the high ambient noise level caused by machinery and equipment as mentioned in 21 publications. While intuitive usability is often indicated as an advantage, the implementation of appropriate systems, including the creation of an appropriate feedback mechanism is considered to be challenging in 10 papers. The integration of the VUI into existing tools and systems are mentioned in seven publications. (Wellsandt et al., 2020a) also see a challenge in the data protection of voice assistance systems. This includes, on the one hand, the appropriate integration of a user authorization system and, on the other hand, the lack of control over the processing of speech data in external cloud systems. The lack of adaptation to the tasks and qualifications of the employees is identified as a problem in six papers. (Villani et al., 2021) point out the need for an appropriate form of interaction especially with older employees, especially about the inclusion of their working experience. (Afanasev et al., 2019) highlight the importance of building trust towards VUI-related AI procedures. Additional environmental influences such as dust, machine vibration or humidity are mentioned in four publications. Three papers mark the correct understanding of jargon and dialectics and the integration of multiple languages as important linguistic challenges in manufacturing logistics. (Haslwanter et al., 2019) highlight this in the context of a shop floor with workers of several nationalities (Fig. 3).

Fig. 3
figure 3

Extracted publications and their application by year (thematic overlaps are shown hatched)

5 Discussion

5.1 Research interest in VUI in manufacturing and logistics

Regarding RQ1, we have identified 56 publications between 1991 and 2021 on voice assistance in manufacturing logistics and found increasing interest in the topic (Fig. 2). There are several explanations for this. Digitalization in manufacturing has experienced a strong growth in the recent years (Statista Research Department, 2018). In order to create a suitable symbiosis between IT systems and employees, appropriate interfaces need to be implemented (Nayyar & Kumar, 2020). Furthermore, there is a constant growth in the number of users of VUIs in everyday life, making VUIs more popular and accepted (Statista Research Department, 2022). The number of industrial robots has increased significantly, which is why the topic of robot control with the help of a VUI is emerging as a trend (Statista Research Department, 2021). The need for an easy interaction with the increasing amount of digital information explains the growing number of publications that use a VUI to output data and information (Statista Research Department, 2018). Less research has been done on data collection, commissioning and quality control. In commissioning, this can be explained by the Pick-by-voice system, which is already successfully established in practice; only evaluations were carried out here. The increasing overlap of different application areas shows the flexibility and versatility of modern VUIs.

It should be noted that this review only pays attention to publications that explicitly address the use of VUIs in manufacturing logistics. Publications without an industrial application focus, which may implicitly lead to further application areas, advantages or challenges of VUIs in manufacturing logistics are not included.

5.2 Characteristics, challenges and solutions for using VUIs in manufacturing and logistics

We analyzed special features, advantages and challenges of using VUIs in manufacturing and logistics as part of RQ2. We have differentiated the input forms of the VUI into command-based and (quasi-)natural. It is noticeable that the intuitiveness and the advantage of reduced cognitive load of the VUI are emphasized in the publications in which the (quasi-)natural input form is used. This is probably due to the fact that there is no literal vocabulary to be learned, but rather analogously expressions are possible. The fact that quasi-natural language input has only been used since 2010 can be explained by the initially higher efforts in the field of natural language data processing (Rogowski, 2010). Problems such as the lack of multilingual language support and intent recognition through natural language input processing in existing software packages are highlighted.

VUIs are combined with other forms of interaction in various publications (Fig. 4). Especially the combination with graphical UIs occurs frequently. In this way, weaknesses of VUIs can be compensated. On the one hand, the output of textual and graphical data was mentioned. This concerns, for example, video clips and animations in maintenance which are difficult to communicate verbally (Bohus & Rudnicky, 2005). On the other hand, the concept matrix shows that the problem of loud ambient noise is rarely mentioned when the system offers further alternative input possibilities.

Fig. 4
figure 4

Additional UIs used in combination with VUIs

In general, the high ambient noise level in industrial environments is the most frequently mentioned challenge (Fig. 5). Besides the use of additional UIs, more solutions were proposed: The Pick-by-Voice system used by (Miller, 2004) pre-records environmental noise to subtract it from later speech input. (Pires, 2005) uses a short-spoken command structure and a noise-suppressing headset microphone. (Chan et al., 2012) show a multichannel signal methodology to avoid misunderstandings by background noise. Rogowski (2012; Afanasev et al., 2019; Tsarouchi et al., 2016; Wellsandt et al., 2020a) also suggest the use of noise-cancelling technologies for the microphone.

Fig. 5
figure 5

Challenges of using VUIs in manufacturing logistics

Other frequently mentioned challenges are the good usability with an appropriate feedback system and the adapted interaction depending on the employee qualification and the task to be performed. Wellsandt et al., (2020a) mention the “Lack of human–machine conversation experience for industrial applications” as explicit barrier (Barrier 5), which leads to further barriers for the implementation of VUIs in the industrial environment (Table 2; Fig. 1). The appropriate adaptation of interaction based on employee factors such as level of qualification or age in a manufacturing logistics context has hardly been investigated. In addition, there is a lack of evaluations in the real production environment. In most of the publications, the evaluation took place in laboratory environments, sometimes with test candidates without manufacturing logistics qualification. Further field tests are necessary to improve the usability of VUI in industrial environments and to adapt the working methods of experienced users. Moreover, the user acceptance and user behavior under time pressure only can be accurately tested this way (Wellsandt et al., 2020a).

5.3 Further fields of research

The existing publications and extracted advantages of VUI result, in the context of RQ3, in further areas of application to be investigated.

5.3.1 Data recording

Data recording (e.g., for documentation or fault reporting of production processes), which was identified early on by Udoka (1991) and Sim et al. (2006) [and mentioned by (Nayyar & Kumar, 2020)], has not yet been investigated further, but offers a high potential due to the demonstrated flexibility of VUIs. This provides the opportunity to create Digital Twins of manual processes and process chains. Especially job shops and small-scale manufacturers with a high diversity of manual processes would benefit from this, with the advantages of hands and eyes-free working, intuitive operability and high interaction speed.

5.3.2 Knowledge management

As (de Bem et al., 2022) show, knowledge management still plays an important role in the digital transformation of the industry. (Longo & Padovano, 2020) propose to provide knowledge via voice assistance, but the creation of a knowledge base for such an assistant in an industrial setting has not yet been investigated. For this purpose, the possibility of decentralized recording of information via voice input using wearables should be used, so that the knowledge of different employees is included and merged. The creation of the knowledge base would benefit in particular from the flexibility and high input speed of a VUI.

5.3.3 Flexible work schedules

In industrial domains where a flexible work schedule is applied, the use of a VUI to quickly coordinate employees is promising. In particular, digital systems for automatic production planning and control are enabled to communicate directly with employees, giving new instructions or proactively requesting information from the shop floor.

5.3.4 User interface for disabled people

The accessibility of the VUI was mentioned as an advantage in five different publications, but there is no concrete research on its application in an industrial setting. (Villani et al., 2021) highlight, for example, physical limitations in interaction that occur due to the increasing employee age or as a result of accidents at work (e.g., sawing accidents in the wood processing industry).

6 Conclusions

This paper presents a systematic literature review on the use of VUIs in manufacturing logistics. Numerous studies have been published on the industrial use of VUIs. The increasing number of publications leads to the need for a systematic literature review to provide an overview of application areas, benefits, challenges and their solutions.

We identified seven different application areas in which workers are supported by means of a VUI. Machine control is clearly the dominant field of application, especially the control of industrial robots. As mentioned in 5.3, there are more applications in manufacturing logistics that would clearly benefit from the advantages of a VUI. Besides Advantages and Challenges, the three properties Input Language, Positioning and Additional user interfaces were examined, which play a special role for VUIs in the industrial context. The barriers to the use of VUIs in manufacturing logistics can be partially resolved in this context, but also motivate further research topics (Table 2; Fig. 1). The unclear benefits become clear through the listing of advantages (Barrier 4). The possibility of hands-free operation and the fast, intuitive usage are the most mentioned advantages. The use of voice-based interfaces with industrial background noise and the lack of adaptation of the dialog to the respective qualification, task and habits of the employee are seen as critical. The suppression of background noise using filtering methods is suggested as a possible solution. It can be assumed that the robustness of VUIs is thus largely given (Barrier 7). The adaptation of the VUI to the employee and his task still needs to be investigated (Barrier 5). There are only few practice-oriented evaluations so far, even though the evaluations showed promising results. Research results should be further validated by user evaluations in a real industrial environment to improve the human–machine interaction. The challenge of an efficient system integration (Barrier 1) still needs to be investigated as mentioned in various publications.

There is a clear upward trend in the use of VUIs in production logistics. We expect to see more research into VUIs in production logistics in the coming years. VUIs provide a fast, intuitive way to interact that can be used hands and eyes-free in a variety of industrial applications, minimizing distraction from the work task.