1 Introduction

Fifth generation (5G) wireless communication networks are rapidly being launched on a large scale worldwide [1]. The academic and industry research community has already started investigating the advancements of the cellular technologies for the next decade, i.e., the sixth generation (6G) [1, 2]. The semantic communication (SC) paradigm has received much attention for next-generation advancements [3,4,5]. SC has a great potential to disrupt current technical problems in existing communication systems based on the possibility of improved performance and effectiveness with the communication from the second level (complementing the impeding issues at the first level communication identified by Shannon and Weaver [6]) in 6G.

6G networks are anticipated to follow a much more comprehensive strategy, catalyzing creative technologies and intelligent infrastructure while conducting prompt and highly effective/productive data collection, transmission, learning, and analyzing everywhere at any moment in time [7]. 6G should concentrate in specific on a modern concept of ubiquitous artificial intelligence for goal-oriented communication, an ultra-flexible infrastructure that facilitates human-level knowledge, semantic processing, and intelligence into all facets of communication networking.

With SC in 6G, the mechanism underlying SC can be the semantic information considering the significance and veracity of the original content (since it may be both instructive and factual). But defining such semantic information or representing semantic characteristics with a specific mathematical model is a highly non-trivial task, making the direct coupling with the first level (Shannon/Weaver communication framework) infeasible.

Fig. 1
figure 1

A high-level view of industry 4.0 evolution and applications. In this work, we focus on the need for semantic communication to automate and orchestrate machines and robots. See the comparison of survey works later in Table 1

One of the main challenges when developing an AI-based SC network design for a realistic future 6G system is;

How to integrate distributed data processing and collaborative learning for semantic communication through an extensive range of heterogeneous wireless devices?

To this end, federated learning (FL) [8, 9] and Asynchronous Advantage Actor Critic (A3C) [10] are now two evolving distributed AI approaches. They allow data-driven AI and machine learning (ML) over a vast amount of dispersed data residing on mobile devices and collaborative learning via knowledge sharing. Thanks to their capability to share knowledge bases, perform model training and learning on diverse and potentially huge scale networks, FL has already drawn tremendous attention while maintaining all relevant information localized [11,12,13].

One of the representative industry verticals considered in this paper is smart manufacturing in industrial IoT networks. In particular, we focus on a cellular-based smart factory characterized by IoT integration and related services in intelligent manufacturing. We believe that advancements of ultra-reliable low-latency communication (URLLC) and massive machine-type communications (mMTC) proposed in 5G by adopting the SC paradigm will be the two key enablers for future Cellular-based factories automation [14].

Fig. 2
figure 2

Industry 4.0 automation and orchestration is relying on URLLC and mMTC. Without an ultrareliable and low latency link, industrial networks cannot collect or process data on the floor, created by IoT sensors or on edge. This knowledge could not be used for surveillance or regular maintenance. Technicians are unable to see issues occurring in real-time, so both software-defined networking and automation-based systems cannot work correctly. To ease the communication issues, we consider a 6G-based smart factory and focus only on the learning and automation of the industry 4.0 system using edge intelligence

Smart manufacturing of Industry 4.0, marks the transition from traditional to highly connected digital technologies in industrial settings as shown in Figs. 1 and 2. As demonstrated in Figs. 1 and 2, this may include the use of Industrial IoT machines, intelligent computation, distributed learning, self-curing networks, and automated industry for smart manufacturing [15]. Such use cases demand highly reliable and minimal latency goal-oriented communication, which the standard 5G approaches cannot support [2].

This research work has been motivated by the issue of how to make industry machines and robots, shown in Fig. 2, (i) learn (understand and adapt) efficiently in a new environment, (ii) perform human-level learning based on solid comprehension (semantic learning) and (iii) transfer their experience or share knowledge-base between sender and receiver to enable all intelligent devices to use prior knowledge and semantics effectively.

We present an FL design for edge intelligence in industry 4.0 systems to tackle the challenges mentioned above thru Continuous Federated Reinforcement Learning (CFRL) approach (extending relevant insights from [10, 16]). We suggest an information fusion-based approach to update a standard global model and knowledge-base implemented at the network edge. Then, effective methods of transfer learning (TL) [17] are applied for consistent human-level cognitive intelligence and semantic learning for the best fitting in the industry 4.0 systems.

Industry 4.0 relies entirely on seamless real-time communication. Therefore, we consider the forthcoming 6G-based factory automation and orchestration using edge intelligence and modern machine learning for contexts. Without an ultra-reliable and low latency link, industrial networks cannot collect or process data on the floor, created by IoT sensors or on edge. This incomplete knowledge base (without background information, environments, and semantics) could not be used for surveillance or regular maintenance. Technicians cannot see issues occurring in real-time, so both SDN and automation-based systems often fail to work correctly. To ease the communication issues, we consider a novel SC-based smart factory envisioning 6G and focus only on the learning and automation/orchestration aspect of the industry 4.0 system with edge intelligence.

1.1 Contributions

Our main contributions in this work are outlined as follows:

  • We develop a Semantic Communication Framework with continuous federated reinforcement learning (CFRL) capabilities considering successful features of human cognitive neuroscience and semantic learning.

Table 1 Summary of the survey works on edge intelligence (c.f. Fig 1)

It enables industrial machines and robots to carry out continuous learning from data streams in the edge-intelligent industry 4.0 process automation/orchestration.

  • We are designing a new knowledge fusion method for learning from data streams aimed at IIoT automation and orchestration. It can integrate previous experience, environmental factors, and expertise of continuously learning machines (for revealing the semantic and pragmatic meaning) and create a standard paradigm for SC-based edge-intelligent manufacturing systems.

  • Two successful methods to learning transition are proposed and evaluated for transfer learning to allow machines to respond efficiently to new settings. The proposed methods and CFRL framework have been assessed with extensive experiment results.

A novel distributed, asynchronous reinforcement learning (recommender-like) system is developed in this paper using the asynchronous advantage actor-critic approach, which combines ideas from TL, A3C, and FL. Our core idea is to keep the machine preferences and/or interactions as local knowledge/learning and adopt integrated local, on-machine, and complementary global models. For example, the training procedure for the global model could only be based on the loss gradients of the local learnings.

2 Related works and theories

Shannon and Weaver theory for reliable communication is essentially mature and established by now [25]. With 5G and advances in technology, all wired and wireless communication can be ultra-reliable. The communication performance at the first level can be enhanced quantitatively but improving the performance qualitatively seems extremely challenging and close to impossible simply by adjusting first level communication parameters [26, 27]. Modern communication systems have smart endpoints, which implies intelligence, capability, and diversity. It is known that diversity suggests (potential) misunderstandings [26] and therefore, we shall now start investing in schemes/algorithms to make the endpoints more reliable for detecting/correcting misconceptions—the tenets of semantic communication.

One may consider a general model of semantic communication where two intelligent interacting agents, sender and receiver, communicate with each other, and the sender wants to accomplish some goals while the receiver is trying to help the sender. We can easily generalize this process at time t in terms of interacting agents where Agent: States (t) \(\times \) Inputs (t) = States (t+1) \(\times \) Outputs (t). However, there is an uncertainty associated with the receiver, where the sender does not know the receiver and vice-versa. This needs more attention as it creates a new class of problems and new challenges.

To this end, along the lines of other researchers [5, 6, 26], we believe that the focus should be toward goal-oriented communication, where the sender always attempts to reach the predefined goal with the help of the receiver. However, “how shall we model and quantify this mechanism?” is an open problem in the field. Classical approaches of modelling/capturing the dynamics by combining some functions of the state of the sender/receiver with their interaction tasks fail in such a semantic-uncertainty setup, which mandates the need for any advice/suggestion from the third agent (such as Knowledge base). The knowledge base, for instance, can pose the tasks for the sender/receiver interaction and judges the success. It can act as another referee agent in keeping track and determining whether the state evolution reflects goal-oriented achievements or not, thus incrementally building more experiences for the knowledge base. In this way, we can see that semantic communication can deal with uncertainly at the endpoints, which is impractical simply by using information theory fundamentals.

We attempt to bridge the gap using recent findings and the current state-of-the-art edge intelligence with relevant insights from the above discussions. Edge intelligence is exceptionally rich in literature (which (potentially) helps to build the knowledge base for the proposed semantic communication); see the summary of survey works in Table 1 and the references therein (viz. [18,19,20,21,22,23,24]). Of particular relevance to this work are the three closest theories—TL, FL, and A3C [10]. FL differs from usual distributed learning as it uses non-IID and unbalanced, massively distributed data over limited communication resources.

A3C determines error using the rewards produced in the state transition, which is different from FL that determines error based on the error in the output (like general NN)—details are discussed as follows.

2.1 Asynchronous advantage actor critic (A3C)

The A3C algorithm developed by Google’s DeepMind community [10] has splashed by obsoleting the standard deep reinforcement learning (DRL) algorithms. It was faster, simpler, more robust, and attained much better scores on the standard DRL tasks. In contrast to other DRL approaches, A3C functions well in both continuous and discrete spaces of operation. Therefore, it has now become the most reliable DRL algorithm for new challenging issues with complex spaces of states and actions.

A high-level architecture of A3C is as shown in Fig. 3 and the three A‘s of A3C are discussed as follows:

Asynchronous. In contrast to conventional DRL techniques, in which a single agent represented by a single neural network interacts with a single environment, A3C uses multiple incarnations (see Fig. 3) to facilitate the learning process. A global network in A3C and several worker agents have their own set of network parameters. Each of these agents communicates with their own copy of the world, whereas the other agents communicate with their surroundings. This works better than having a single agent (beyond the speed of getting more work done) because each agent’s experience is independent of the others’ experience. It is worth noting that the overall expertise available for training becomes more diverse and asynchronous.

Fig. 3
figure 3

An abstract view of the overall architectural design diagram for A3C

Advantage. From our Policy Gradient implementation perspective, the updated policy used the discounted returns from a series of interactions to inform the agent which of their acts were ‘great’ and which were ‘negative’. The system can then be modified with an advantage to promote better and deter actions adequately. The perspective to use advantage estimates rather than merely discounted returns is to enable the agent to determine not about how decent their actions have been but how much smarter they were than expected. Rationally, this allows the system to concentrate on where the network’s predictions have been sorely missing.

Actor-Critic. It provides the best of both policy gradient and Q-learning approaches. In the scenario of A3C, both a value function V(s) (how good a particular state is to be in) and a policy \(\pi (s)\) (a set of probability outputs for action) will be estimated. These will each be distinct layers sitting at the top of the network, fully connected. Pragmatically, the agent uses the value estimate (the critic) to more intelligently update the policy (the actor) than conventional methods of policy gradients.

Our framework implements federated multi-machine learning through the fusion of knowledge. Google first proposed federated learning, which demonstrated its effectiveness through experiments on different datasets [8]. The datasets are collected and stored at multiple network end nodes in federated learning systems. A learning model is trained from the decentralized datasets at a centralized global server location [28]. Unlike the traditional standard learning method, where multiple edges simultaneously learn, our framework adopts A3C and the first training method, then fuses knowledge to reduce dependence on the communication quality.

2.2 Transfer learning

As discussed above, upon identifying an expert machine, the learning agent utilizes the transferred DQN model from the expert machine [29] and its current native DQN model to produce an aggregate DQN model [30]. Given the transfer rate \(r\in [0,1]\), the new Q-learning agent can therefore be mathematically interpreted asFootnote 1

$$ Q^{{{\text{new}}}} (s,a) = rQ^{{{\text{transfer}}}} (s,a) + (1 - r)Q^{{{\text{current}}}} (s,a). $$
(1)

In a distributed cooperative 6G empowered IIoT multiple machines system, the policy vector of all agents can be updated by:

$$\begin{aligned} \varPi _{ t+1}(s_t) = \begin{pmatrix} \pi ^1_{t+1}\\ \vdots \\ \pi ^i_{t+1} \\ \vdots \\ \pi ^M_{t+1} \end{pmatrix}= \begin{pmatrix} \arg \max _{a^1}\{Q^1_{t+1}(s_t^1,a_t^1)\} \\ \vdots \\ \arg \max _{a^i}\{Q^i_{t+1}(s_t^i,a_t^i)\} \\ \vdots \\ \arg \max _{a^M}\{Q^M_{t+1}(s_t^M,a_t^M)\} \end{pmatrix}. \end{aligned}$$

3 Our proposed methodology

3.1 CFRL framework for semantic communication with extended A3C

CFRL will reduce the training period without losing accuracy of decision functionality on industrial edge computing systems. CFRL allows the use of the edge-computing configuration to understand the policies and goals desired. It comprises an edge server consisting of a knowledge base, groups of settings, and one or more robots/machines. Moreover, we are developing a federated learning framework using relevant insights from A3C for the fusion of local models into a global model shared at the knowledge base of an edge server. The edge server maintains local fusion models into a mutual environment, and then, the mutual environment develops the aforementioned semantic communication capabilities with changes in the settings.

Fig. 4
figure 4

A high-level view of the proposed CFRL framework. In an industrial environment, through reinforcement learning, the robots/machines learn to automate specific new tasks in the new environment and obtain the private local model. Local models may also be the product of several robots, not only from one robot working in various settings. After that, Robots must transfer the local model to the knowledge base of the edge server. By fusing local models into the global model, the knowledge base of the edge server evolves the global model. Motivated by the transfer learning approach, at the knowledge base of the edge server, the robot/machine uses successor features to transfer the strategy to a new environment

As illustrated in Fig. 4, CFRL is a lifetime learning-cum-communication framework of industry network automation/orchestration systems. Compared to the standard A3C that simultaneously updates policy network parameters, the recommended information fusion solution is more appropriate for the proposed SC framework using the federated architecture. The presented technique will be capable of fusing asynchronous evolutionary models for goal-oriented communication. At the very same time, the method of modifying parameters has some environmental requirements. At the same time, the new knowledge fusion idea considers the embodiment of the semantics and therefore has very little or no environmental impact. Using a generative network and dynamic weighting approach, knowledge incorporation and building can be done evolutionarily in lieu of A3C (which only produces a decision model during learning and provide recommendations).

Unlike A3C, the training environment will be highly dynamic in the proposed CFRL framework for the intended goal-oriented communications as shown in Algorithm 2. State of the agents can be considered as uncountable with more uploading of the training environments. The structure of the policy’s hidden layers can be in different actors, and even it could be in a different network setting. Typically, training results fuse in the knowledge base of the edge server. Based on the shared model, the robots are continually trained in new environments. In upload and download procedures, robots/machines and the edge server have continuous interactions. Therefore, the proposed SC framework is best suited to edge intelligence for future industry 4.0 automation/orchestration systems where the environment is uncountable. In fact, the proposed framework can be applied from a lifelong learning perspective. However, the answer to a question: “Why does natural (human) communication differ so much from such a designed communication?” requires further investigations. This is beyond the scope of this work.

figure a
figure b

3.2 Federated learning for knowledge fusion

For a local server in the machine, the initial Q-network must be defined when the machine downloads the global model from the knowledge base of the edge server. Such an initial Q-network is capable of reaching the goal and avoiding specific automation/orchestration tasks. Observe in Fig. 4 and Algorithm 3 that the CFRL will reduce the training time for machines and learn their automation/orchestration tasks. In our experimental setting, the knowledge base of the edge server does not automatically fuse the local model every time a machine uploads it but fuses at a defined rate. So in Algorithms 1, 2 and 3, as shown in Fig. 4, we apply the computing flow of the CFRL algorithm for both knowledge function and model transfer learning [31].

It is known that uncertainty is the embodiment of confidence. Therefore, we use information entropy to define confidence in this work (along the line of that of [16]). In particular, the confidence for a machine i (information entropy) is given by [16, Eq. 1]

$$ {\text{Conf}}_{i} : = - \frac{1}{{{\text{logn}}}}\sum\limits_{{j = 1}}^{m} {\left\{ {\frac{{{\text{Score}}_{{ji}} }}{{\sum\limits_{{j = 1}}^{m} {{\text{Score}}} _{{ji}} }}.log\frac{{{\text{Score}}_{{ji}} }}{{\sum\limits_{{j = 1}}^{m} {{\text{Score}}} _{{ji}} }}} \right\}{\text{ }}} $$
(2)

where n is the size of local networks and m is the size of actions of a machine. Using Eqn. 2, the memory weightage of the machine i and knowledge fusion can be computed as [16, Eqs. 46],

$$ {\text{d}}w_{i} = {\text{ }}\frac{{1 - {\text{Conf}}_{i} }}{{\sum\limits_{{i = 1}}^{n} 1 - {\text{Conf}}_{i} }} $$
(3)
$$ {\text{Label}}_{j} = {\text{ Score}} \times ({\text{Conf}}_{1} ,{\text{Conf}}_{2} , \ldots ,{\text{Conf}}_{n} )^{T} {\text{ }} $$
(4)
$$\begin{aligned} \varTheta ^{\star }= & {} \arg \min _\varTheta \frac{1}{M}\sum _{i=1}^M(y_i-h_\varTheta (x_i))^2 \end{aligned}$$
(5)

Using the error \((y_i-h_\varTheta (x_i))^2\) in the training process, see in Algorithms 2 and 3, Eqs. 4 and 5 are the main aim of the learning process.

figure c

3.3 Transfer learning for sharing knowledge

Different methods have been proposed to transfer reinforcement learning in the literature. We found that there are two approaches applied in the specific task when a machine learns to automate those tasks. One promising approach is taking the global model as a starting actor in the network, which attains a good score initially but is quite unstable. Another system would be using the global model as a feature extractor in transfer learning. The latter method increases the dimension of the features, and it can improve the effect stably. One problem with the latter approach is to solve in the experiment: the structural difference between the input layer of the global network and local network.

With an underlying Markovian decision cycle consisting of action space, device space, transition matrix, we follow the former approach and consider the DRL problem in factory automation/orchestration. After taking action, a reward is received at any state for the given policy and discount factor. The Q-function gives appropriate action for any given policy \(\pi \),

$$\begin{aligned} Q^\pi (s, a)= {\mathbb {E}} \Bigg [\sum _{t=0}^{\infty }\gamma ^t R\big (s_t, a_t\big ) \Big |s_0=s, a_0=a,\pi \Bigg ] \end{aligned}$$
(6)

where R(sa) is the reward perceived at state \(s\in {\mathcal {S}}\) when action \(a\in {\mathcal {A}}\) is taken, and \(0<\gamma <1\) is the discount factor. Assuming that the reward satisfies \(0<R(s,a)<1\), the optimal policy \(\pi ^{\star }\) has the Q-function \(Q^\star (s, a)\)

$$\begin{aligned} Q^\star (s, a)= R\big (s, a\big ) + \gamma \displaystyle \mathop {{\mathbb {E}}}_{\begin{array}{c} s'\sim T(s'|s,a) \\ a'\sim \pi (a|s) \end{array}}\Big [Q^\star (s', a')|s_t=s\Big ] \end{aligned}$$
(7)

where T is the transition matrix \(T=\{T^a_{s,s'};s\in {\mathcal {S}}, a\in {\mathcal {A}}\}\) and \(T^a_{s,s'}\) is the transition probability from s to \(s'\). Eq. (7) satisfies optimized conditions of the Bellman equation.

It is worth noting that it is a contraction mapping for the Q-function above and has a special fixed-point structure for optimum Q-function. We use the max-square error metric, \(\mathrm{M_{Sq}}(Q)\), which quantifies the excellence of Q-learning as follows:

$$\begin{aligned} \mathrm{M_{Sq}}(Q):= & {} \max _{s,a}\big (Q^\star (s, a)-Q(s, a)\big )^2\,\text{ where }\nonumber \\ Q(s, a)= & {} R\big (s, a\big ) + \gamma \displaystyle \mathop {{\mathbb {E}}}_{\begin{array}{c} s'\sim T(s'|s,a) \\ a'\sim \pi (a|s) \end{array}}\Big [Q(s', a')|s_t=s\Big ] \end{aligned}$$
(8)

We have the following intuition underneath Algorithm 4. Whenever a new machine trained offline is installed into a new IIoT network under similar conditions, their optimal Q-function and evolution would always be identical. It is beneficial to use the transferred target, a better target, therefore consistently helpful to accelerate the rate of convergence in the overall learning process.

figure d

4 Performance evaluation

In this section, we plan to address the following three critical questions.

  • How can our SC framework reduces training time substantially despite losing automation/orchestration accuracy and effectiveness in 6G-enabled edge-computing systems?

  • How successful is the knowledge fusion algorithm in stimulating the global model and goal-oriented evolution?

  • How successful is the transfer learning strategies for SC in exploiting typical pattern to a particular automation/orchestration task?

To answer the first question, we are conducting experiments to compare the performance of the standard approach with that of the CFRL framework. For the second question, we are performing experiments to compare the performance of generic models and the global model of transfer reinforcement learning.

4.1 Simulation test-bed setup

Four different training settings are considered to illustrate other consequences among the comprehensive approach of the training and the CFRL framework, considering automation/orchestration in the navigation of a moving machine or a robot. There are no obstacles except the walls in Environment 1. There are four fixed poles as obstacles in Environment 2 and four moving poles as obstacles in Environment 3. In particular, there are both dynamic and static barriers in Environment 4. The observation area is between 0.1 m and 5 m.

4.2 Evaluation of CFRL framework

We have carried out three environments, and the output of CFRL is in Figs. 5, 6, and 7 in accordance with the performance of generic methods. In Environment 2 and Environment 3, CFRL increased the automation/orchestration decision accuracy and reduced training time for the 6G automated edge-intelligent system. The improvement can be observed in Figs. 6 and 7, which is highly efficient compared with the global shared model.

CFRL is very effective in learning policy for all SC constraints and industry automation/orchestration considered. It enhances the ability of our learned model to generalize across commonly faced environments and situations. Experimental results show that CFRL is able to minimize training time without compromising accuracy loss in the automation/orchestration process for the proposed SC-based intelligent industrial network computing.

Fig. 5
figure 5

With the training approach of environment 1, CFRL has an exactly similar result as that of the standard method. This is because there is no sharing of the learning models from the old machines to new machines for the anticipated industry automation/orchestration

Fig. 6
figure 6

In the training procedure of Environment-2, CFRL (black curve) demonstrated the global model sharing. CFRL obtains better rewards in less time when compared with the standard approach (red curve)

Fig. 7
figure 7

In Environment 3, CFRL evolves the global model demonstrated excellent results (black curve). Observe that CFRL obtains better rewards in less time when compared with the standard approach (compare red and black curves). However, CFRL is not performing as well as the standard method when learning time is longer than 50 milliseconds

Fig. 8
figure 8

Comparison of the two transfer learning approaches

Table 2 Results of the extensive experiments to demonstrate the efficiency of knowledge fusion in the CFRL approach

4.3 Evaluation of knowledge fusion approach

Table 2, we present comparative and quantitative results of our knowledge fusion approach for the SC framework. Observe in Table 2 that the federated learning approach with the shared model reduces training time on a continual learning basis. In fact, it can be found that standard process models are only capable of making excellent decisions in specific settings. In contrast, the CFRL-based model with knowledge infusion is capable of making the best decisions in a variety of different settings. Overall, the presented information fusion method is therefore successful and highly efficient for the SC framework.

4.4 Evaluation of transfer learning algorithm

We performed a quantitative analysis to validate and equate the two transfer learning methods. The result appears in Fig. 8. From the figure, it can be observed that all strategies to transfer learning will successfully increase the performance of reinforcement learning and hence boosts the performance of the proposed SC framework. The parameter transfer approach has faster learning speed (red curve), and the feature extractor approach has more excellent evolution stability (black curve).

5 Conclusion

We developed a novel edge-intelligence-based semantic communication framework for 6G. It is based on federated-learning-based 6G networking for the automation/orchestration systems of industry 4.0 networks. The proposed intelligent networking approach can effectively use prior knowledge and adapt rapidly to a changing environment and system semantics. We also presented a knowledge fusion algorithm in the framework and provided insights into transfer-based learning. Our approach allows models to fuse as well as share the model to evolve asynchronously using knowledge-base. Using knowledge based on policy tests, we have evaluated our design and the framework extensively. It is worth noting that our method has constrained requirements and limitations for a full-fledged SC of machines/robots. The extension of the developed framework to handle the intended dimensions of SC flexibly requires further investigations and is left for future work. Nevertheless, the developed edge-intelligent 6G framework is an initial innovation to offer a broader range of industry 4.0 services, and further advancements will be remarkable.