Electromagnetic-Informed Generative Models for Passive RF Sensing and Perception of Body Motions

Electromagnetic (EM) body models predict the impact of human presence and motions on the Radio-Frequency (RF) field originated from wireless devices nearby. Despite their accuracy, EM models are time-consuming methods which prevent their adoption in strict real-time computational imaging and estimation problems, such as passive localization, RF tomography, and holography. Physicsinformed Generative Neural Network (GNN) models have recently attracted a lot of attention thanks to their potential to reproduce a process by incorporating relevant physical laws and constraints. They can be used to simulate or reconstruct missing data or samples, reproduce EM propagation effects, approximated EM fields, and learn a physics-informed data distribution, i.e., the Bayesian prior. Generative machine learning represents a multidisciplinary research area weaving together physical/EM modelling, signal processing, and Artificial Intelligence (AI). The paper discusses two popular techniques, namely Variational Auto-Encoders (VAEs) and Generative Adversarial Networks (GANs), and their adaptations to incorporate relevant EM body diffraction methods. The proposed EM-informed GNN models are verified against classical EM tools driven by diffraction theory, and validated on real data. The paper explores emerging opportunities of GNN tools targeting real-time passive RF sensing in communication systems with dense antenna arrays. Proposed tools are also designed, implemented, and verified on resource constrained wireless devices. Simulated and experimental analysis reveal that GNNs can limit the use of time-consuming and privacy-sensitive training stages as well as intensive EM computations. On the other hand, they require hyper-parameter tuning to achieve a good compromise between accuracy and generalization.

movements of human bodies can be interpreted using electromagnetic (EM) propagation theory considerations [9].These EM methods have paved the way to several physical and statistical models for passive radio sensing, which exploit full wave approaches [10], ray tracing [11], moving point scattering [12], and diffraction theory [13], [14], [15], [16].The body-induced perturbations that impair the radio channel, can be thus acquired, measured, and processed using model-based methods to estimate location and track target information.A general EM model for the prediction of bodyinduced effects on propagation is still under scrutiny [17].On the other hand, current models are too complex to be of practical use for real-time sensing scenarios [10], [11], although they can be used for off-line applications such as network pre-deployment assessment [18].
Physics-informed generative machine learning is an emerging field in different application contexts ranging from imaging [19], EM field computation [20], to Bayesian estimation for inverse problems [21].Generative deep Neural Networks (GNN) can be trained to produce observations drawn from a distribution which reflects the complex underlying physics of the environment under study, or rather reproduce approximate fields in an almost negligible time compared with classical numerical methods [22], [23].For the first time, our paper discusses the adoption of GNN models designed to reproduce the effects of body movements on EM propagation, considering varying size, position and orientation/posture of the body, multiple antenna (e.g., MIMO) setups and different physical and geometrical properties of the radio link(s).
The proposed physics-informed GNN models are trained with samples obtained from EM models based on diffraction theory [15], [16], under different environment configurations.The GNN tools discussed in this paper are based on Variational Auto-Encoders (VAEs) [24] and Generative Adversarial Networks (GANs) [25], [26].The opportunities and the limitations of each proposed approach are discussed and compared in several case studies targeting the perception of body motions and passive RF sensing applications.

A. RELATED WORKS
Physics-informed GNN models use Machine Learning (ML) methods for computing physical processes.Although still in their infancy, they have been recently proposed to approximate EM fields.A small body of existing works related to this problem does exist.A ML model is proposed in [27] to obtain an approximation of the EM field in a cavity with an arbitrary spatial dielectric permittivity distribution.The model is shown to be one order of magnitude faster than similar finite-difference frequency-domain simulations, suggesting possible applications in inverse problems.In [28], a neural solver for Poisson's equation is proposed using a purely-convolutional neural network structure.An approach for solving Partial Differential Equations (PDEs) using Neural Networks (NNs) has recently emerged [29], where a physics-based loss function is constructed to improve NN training.Compared to traditional EM field computing methods based on numerical integration and/or mesh-based methods, an attractive feature of physics-informed models, based on Deep NN (DNN) implementations, is that they could break the curse of dimensionality [30].In addition, once trained, DNNs can solve an EM problem in an almost negligible time in comparison to classical numerical methods [22], [23].Finally, generation accuracy and training time can be improved by incorporating a small amount of labeled data or EM field measurements (if available) during the training process.
Applications of GNN tools to communications and localization are also emerging [31].For example, [32] discussed a convolutional encoder-decoder structure that can be trained to reproduce the results of a ray-tracer, encoding also physics-based information of an indoor environment.A MLassisted channel modeling approach is proposed in [33] to generate site-specific mmWave channel characteristics.The model is shown to improve the generalization capabilities of conventional physical-statistical models when adopted to reproduce complex network configurations.A Multibranch GAN (MBGAN) has been recently analyzed for radar signal processing to synthesize data that reflect human physical properties and kinematics [34].The model is shown to provide an increase of 9% in classification accuracy.

B. OBJECTIVES AND CONTRIBUTIONS
The paper discusses for the first time the adoption of EMinformed generative neural network models inspired by VAE and GAN tools [24], [25].As depicted in Fig. 1, the GNN models are designed and trained to reproduce the human blockage effects on radio propagation as underpinned by scalar diffraction theory.The models consider different body and link configurations relevant in radio sensing: 1) varying link geometries: distance (d), height from the ground (h); 2) Multiple-Input Multiple-Output (MIMO) antennas (L); 3) variable size (w S,1 , w S,2 , h S ), position (x, y) and orientation/posture (ϕ) of the monitored target.The proposed GNNs comprise of: i) a generator of bodyinduced RF signal attenuations, andii) a generator of EM field samples which can be used for EM full-wave propagation analysis.The generators are shown to reproduce human body blockage effects under configurations which might be unseen during the training phase, or rather difficult to predict through traditional EM field computing methods.The considered GNN models are designed to reproduce the body blockage effects according to selectable configurations: therefore, they generate EM field samples conditioned on input body characteristics, or features, that can be selected at run-time, namely during RF sensing deployment.
In line with initial studies [19], [22], [23], the generators are purely based on sequences of convolutional (and deconvolutional) layers whose number varies depending on the physical quantity to be reproduced.Therefore, they are well-suited for real-time localization applications as do not need intensive EM wavefield computations.VAE and GAN [22], [26] generator methods are analyzed in terms of their accuracy in reproducing EM model diffraction effects, implementation complexity, generation times, and model size.
The paper is organized as follows: Section II introduces the passive RF sensing problem and motivations.Section III reviews relevant EM body models that quantify the human body blockage based on diffraction theory considerations.Section IV targets the proposed GNN approaches and discusses VAE and GAN tools.Section V validates the proposed generators against EM diffraction and thus verifies the effectiveness of the models in reproducing body blockage effects on single and multiple antennas wireless receivers.Section VI discusses an experimental case study.The goal is to demonstrate the effectiveness of GNN tools in reproducing real field measurements, supporting the RF sensing and the passive localization processes.Finally, concluding Section VII summarizes the open problems, the opportunities, and the limitations of the study.

II. BACKGROUND AND PROBLEM FORMULATION
The RF sensing goal is to extract the EM human body(ies) blockage effects (E θ ) from noisy measurements S t of the RF radiation observed at time t.The human subject(s) is characterized by an unknown state θ which is recovered from E θ .The body effects E θ can be evaluated in terms of body-induced excess attenuations A θ [15], [35], as in the example of Fig. 1.Baseband Channel State Information (CSI) C θ [36] can be evaluated as well.The body state θ consists of an ensemble of features, e.g., body location, size, height, and orientation (see Section III-A) [6], [15], [16], which depend on the specific sensing application.In what follows, we provide the necessary background on RF sensing and Bayesian methods.

A. RF-SENSING AND BAYESIAN FORMULATION
The objective of the RF sensing inverse problem is to obtain the posterior distribution p(E θ |S t ): of the (unknown) human body blockage effects E θ , given the measurements S t .Maximum A-Posteriori (MAP) solution to (1) allows to extract the most likely effects: from which it is possible to recover the subject state θ and any feature (ϑ ∈ θ) of interest, e.g., body position, size, height, and orientation.Field measurements S t can be in the form of received power, Received Signal Strength (RSS), or base-band CSI response [5].Observations S t are perturbed by the body movements according to a prior distribution, p(E θ ), which predicts the effects of the body (i.e., the target) in the state θ as the result of the propagation of the reflected, scattered, and diffracted EM waves.The Bayesian approach (1) for solving the radio sensing problem (2) requires the knowledge of the likelihood function p(S t |E θ ), namely the RF measurement model, and the prior distribution p(E θ ).The likelihood term depends on the data collection process as well as on the impairments introduced by the measurement instrument or by the environment.It is typically chosen as log-normal distributed, according to [2], [6].On the other hand, the prior distribution p(E θ ), which models the initial beliefs on E θ , is usually hard to represent as it often requires full-wave EM approaches.Approximated solutions, such as diffraction models [13], [16], and several variants [12], are in many cases too time-consuming to be of practical use for realtime sensing scenarios [11].In addition, when it comes to practice, imperfect knowledge of the scenario, small, involuntary, body movements, or changing configurations of the propagation environment, make the prior even more difficult to obtain with an acceptable level of accuracy [5].

B. BAYESIAN PRIOR MODELLING OF EM BODY EFFECTS
The EM-informed GNN tools discussed in this paper are designed to reproduce the EM effects E θ as sampled from the Bayesian prior probability distribution p(E θ ).The prior p(E θ ) quantifies the uncertainties of the body effects E θ as caused by imperfect knowledge of the body state θ .It is defined in general as: ( In other words, the EM effects E θ are obtained for random instances of body features θ that follow a probability function p(θ |θ k ).Probability p(θ |θ k ) models the uncertainty with respect to the nominal body features θ k .Some examples are proposed in the following to clarify the approach.First, consider the problem of generating body-induced RF excess attenuation values (A θ ) for a subject located at some (nominal) position θ k = (x, y) [5].Involuntary movements, as the result of the complex structure of the human body, strongly affect the RSS [5] and must be adequately taken into account.Body motions can be represented by random movements in an elementary squared area of size around the nominal location (x, y).Subject movements can be modelled by setting p(θ |θ k ) = U −( /2),( /2) with = 5 ÷ 10 cm [16].Replacing E θ with A θ , the prior distribution becomes p(A θ |θ k ) = A θ∼U −( /2),( /2) .
Likewise, let's consider now the problem of subject activity recognition [36], which requires to real-time track the subject trajectory and orientation w.r.t. the LOS path and its effects on CSI (C θ ) measurements.Diffraction models [13], [16] can be designed to capture the rotation angle ϕ of the 2D target (see Fig. 2) by varying the size of the absorbing sheet S that represents the body.However, this operation is often expensive in terms of computational time.Rather than simulating each target rotation angle separately, which is not feasible, the proposed generative model can be set to reproduce the CSI C θ for all subject orientations1 simultaneously, namely p(C Further examples are given in Section V.

III. EM BODY MODELS
The proposed GNN tools are optimized to match the prior distribution p(E θ ) in (3) using (few) training examples obtained from scalar diffraction theory [15], [16].In this section, we discuss relevant diffraction-based EM body models to reproduce the human body blockage effects E θ , considering also body-induced RF excess attenuations A θ and CSI C θ , as special cases.First, we briefly recall the body models proposed in [15] for a single link scenario using scalar diffraction theory considerations.Next, we consider a receiver equipped with an array, i.e., Uniform Linear Array (ULA), of L isotropic receiver antennas.The diffraction model represents the body effects relative to each radio link , namely In what follows, we will always assume that the monitored target is in the Fraunhofer's region of both transmitting (TX) and receiving (RX) antennas for all the considered links .Extension to multi-target scenarios can be also inferred according to [16].

A. DIFFRACTION MODELS FOR BODY-INDUCED EXCESS ATTENUATIONS
As depicted in Fig. 1, we assume that the length of the radio link is given by d while h is its height from the floor.The effects of floor, walls, ceiling or other obstacles are not considered.However, with some effort, these obstacles can be included, as shown in [43].The scalar diffraction theory assumes that the 3D shape of the human body is modeled as a 2D rectangular absorbing sheet S [15] with height h S and traversal size that changes according to a 3D cylinder view, with max. and min.traversal sizes w S,1 , w S,2 , respectively.The target has nominal position coordinates p = [x, y], w.r.t. the TX position, which is defined by the projection of its barycenter on the horizontal plane that includes the Lineof-Sight (LOS).The 2D target might be also rotated of an angle ϕ with respect to the LOS direction.The body/subject state θ is characterized by an ensemble of body features collected into the vector θ := {p, ϕ, h S , w S,1 , w S,2 }.
A distribution of Huygens' sources of elementary area dS is located on the absorbing sheet S. The electric field E θ at the receiver [15] is obtained by subtracting the contribution of the obstructed Huygens' sources from the electric field E 0 of the free-space scenario (with no target in the link area): where time t is omitted to simplify the reasoning.According to [15], equation ( 4) can be rewritten in terms of the field ratio C θ = (E θ /E 0 ), representing the CSI: where λ is the wavelength.Notice that each elementary source dS = dξ 2 dξ 3 has distance r 1 and r 2 from the TX and the RX, respectively which depends on the relative coordinates p.

B. MULTIPLE ANTENNA ARRAY CONFIGURATIONS
We now consider an ULA configuration with links ordered as −M ≤ ≤ M and being RX the receiver node for corresponding link .The central antenna of the array is indicated by the index = 0.As shown in Fig. 2, each -th antenna RX of the array is uniformly deployed at mutual distance a along a segment orthogonal to the LOS at distance d from the TX and horizontally placed w.r.t.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
the floor.Ignoring mutual antenna coupling (approximately valid for a > λ/4, see [37]), the CSI observed on the -th antenna of the array corresponds to the ratio of the electric fields C ,θ = (E ,θ, /E ,0 ): therefore, using (5) it is: where E ,0 is the EM field received by the same RX node in the reference condition, i.e., the free-space scenario.The term d indicates the distance of the -th antenna RX of the array from the TX while d 1, and d 2, are the distances of the projection point O (of the barycenter P of the 2D surface S) from the TX and RX nodes.Likewise, r 1, and r 2, are the distances of the generic elementary area dS of the target S from the TX and RX , respectively.Notice that, for M = 0, equation ( 6) reduces to the single-antenna case (5), where RX 0 coincides with the RX antenna at distance d = d 0 from the TX.CSI data (C ,θ ) and the corresponding excess attenuation values (A ,θ = −10 log 10 |C ,θ | 2 ) represent the human body blockage effects for link .These are organized into the vectors: In addition to the CSI terms C θ , the EM field E ,θ, observed on link can be also re-arranged according to (6) as: where E 0,0 is the electric field received by the central antenna of the array of index = 0. Using (8), as shown at the bottom of the page, and (6), the EM field for each considered link can be obtained as:

C. EFFECTS OF HUMAN BODY BLOCKAGE ON ARRAY RESPONSE
Based on the previous analysis, we now highlight the EM effects of target movements on the array response [38] of conventional linear beamforming processing [44].Using the same ULA configuration, we consider the vector w(γ T of linear beamforming coefficients designed to steer the array in a direction γ .The received baseband signal r θ (γ ) at the output of the beamforming processing is given by: (10) where H indicates conjugate transpose operation, n is the -th element of the Additive White Gaussian Noise (AWGN) complex vector n = [n −M . . ., n −1 n 0 n 1 . . ., n M ] T of size 2M + 1, that is assumed to be spatially white with zero mean and covariance σ 2 I. Neglecting the AWGN noise 2 and considering the CSI C ,θ defined in (6), the array response R θ (γ ) as due to a target in state θ is defined as [38] Notice that conventional ULA scenarios assume planar wavefront propagation.In this case, the steering vector w(γ ) for the considered array is given by [44]: where a = λ/2 is the inter-element antenna distance.
According to (12), it is also The dominant Direction of Arrival (DoA) γ max , namely the maximum response of the array, is obtained as: and will be considered in the analysis of Section V.

IV. EM-INFORMED GENERATIVE NEURAL NETWORKS TOOLS
The generative models considered in this section reproduce body-induced EM effects E θ as sampled from the conditional prior distribution p(E θ |θ k ) in (3).The prior is thus conditioned on the input body features θ k .As shown in Fig. 3, the generation process is implemented by a decoder (VAE), or a generator (GAN), both parameterized by deep Neural Network (NN) parameters W D and W G , respectively.The neural networks map the input latent space z ∼ p Z (z) of size Z (z ∈ R Z×1 ), into the output space: The generated samples E θ are thus set to reproduce the targeted EM model, namely As shown in Fig. 3, the NN parameters W D and W G constitute the generation models and are trained separately to reproduce body-induced excess attenuations A θ or EM field samples E θ , respectively.
2 In line with the setup described in (1), the generative model is now designed to reproduce the prior effects of body movements on the response of the array; therefore, it appears reasonable to neglect the effect of measurement and AWGN noise, as well as fading.Below we discuss VAE and GAN model architectures [25], [26] referred to as conditional-VAE (C-VAE) and unbalanced conditional-GAN (UC-GAN) [46].Both models are adapted to generate samples conditioned on input body features θ k that can be chosen at run-time.In the following, we limit our focus on the body feature set θ k = [p k , ϕ k , h S , w S,1 , w S,2 ] so to generate body effects for varying locations p k , orientations −π/2 ≤ ϕ k ≤ π/2, and sizes h S , w S,1 , w S,2 of the target.Although different approaches are possible, the problem is complex enough to make a full EM simulation unfeasible, thus motivating the use of GNN models.

A. CONDITIONAL VARIATIONAL AUTOENCODER (C-VAE)
As depicted in Figs.3a) and 3c), the C-VAE model uses an encoder Q(z|E θ , θ k ; W E ), parameterized by NN parameters W E , which learns the latent space p Z (z|θ k ) ∼ N (μ k , σ 2 k ) for inputs θ k .Latent space is multivariate Gaussian distributed with mean μ k and standard deviation σ k parameters (other choices are not investigated here).The encoder is trained using samples E θ obtained from the EM model ( 9) and the corresponding body states θ k .Model training is further discussed in Section V.The decoder produces a distribution which is the marginalization of the conditional probability p VAE gen (E θ |z, θ k ; W D ) function of the NN parameters W D .The goal is to maximize the likelihood bound called Evidence Lower BOund (ELBO) L ELBO described in [45].Omitting dependency on parameters W E and W D , it is: gen (E θ |z, θ k )] is the log-likelihood function, while the second one is the Kullback-Leibler (KL) divergence D KL [47] between the encoder output and the input latent space.
Maximization of the likelihood k in ( 16) makes the generated samples E VAE θ more correlated to the latent variables z, which typically cause the model to be more deterministic.On the other hand, the number of latent variables Z as well as the ELBO weight term β > 0 can be tuned to increase the contribution of the KL divergence between the posterior and the prior to the total ELBO and thus increase the randomness of generated samples.Targeting passive localization applications, in Section V, we will show Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
that both terms (Z, β) can be optimized to improve the generation process, also to account for measurements S t affected by noise and multipath interference.

B. UNBALANCED CONDITIONAL GAN (UC-GAN)
GAN training, depicted in Figs.3b) and 3d), is formulated as a min-max problem that can be interpreted as an adversarial game with two players: the discriminator D( E θ , E θ ; W I ) ∈ [0, 1], namely a binary classifier which tries to improve the detection of fake EM field samples, and the generator which is designed to fool the discriminator.The generator produces now driven by p Z (z) ∼ N (0, I).The goal is now to minimize the statistical distance (Jensen-Shannon divergence, based on [47]) between p(E θ |θ k ) and p GAN gen .This corresponds to maximizing the discriminator loss while minimizing the generator one.More details can be found in [26].Physics-informed GAN based models were proposed in several works [23], [32].For the considered problem, we adopted an unbalanced implementation [46] which pretrains the generator using the parameters W D of the C-VAE decoder (15).This prevents the faster convergence of the discriminator at early epochs which could limit the generator reproduction accuracy.

C. MODEL TRAINING AND IMPLEMENTATION CONSIDERATIONS
C-VAE and UC-GAN pre-trained models shown in Fig. 3 are available on-line [41] together with example codes for training on new samples, and testing, namely generating body-induced excess attenuations ( A θ ) and CSI ( C θ ), according to specific body configurations.In the following sections, we discuss few critical implementation constraints and related general considerations useful for the case study of Section VI.
GNN trainable parameters and models.Considering C-VAE in Figs.3a) and 3c), the encoder model W E includes two key components: a sequence of convolutional layers and a feed-forward network.The encoder takes as input training samples obtained from the diffraction model as well as the conditional inputs θ k .Excess attenuations A θ and EM field generation E θ require a different number of convolutional layer subcomponents, to reflect the dimension of the data: 2 layers are chosen for reproducing excess attenuations A θ , while 3 layers are required to generate EM field samples E θ .This choice is conservative since it is critical to limit the size of trainable model parameters, while no apparent performance improvement is observable beyond this limit.The decoder W D reproduces samples of body effects as a function of customized inputs θ k that are one-hot encoded before being used as input to the neural network.It uses transposed convolution layers, also referred to as fractionally-strided convolutions, to increase (upsample) the Generation times.Generation times of C-VAE, UC-GAN and EM diffraction models are compared in Tab. 1 considering single antenna TX and RX and a MIMO setup with L = 81 links.Reproduction of excess attenuations A θ and EM field samples E θ are analyzed separately.For each case, time measurements are obtained using a Jetson Nano single-board computer equipped with a quad-core ARM-Cortex-A57 SoC, 4 GB RAM, 128 Compute Unified Device Architecture (CUDA) cores, and a Maxwell GPU architecture.This is representative of a typical resource-constrained wireless device.Note that, on average, the VAE/GAN-based generation of the EM body effects is about × 60 ÷ 100 times faster than EM model computation.The latter also depends on the chosen numerical integration configurations, i.e., tiled integration method, and absolute error tolerance, target size, and antenna configuration (omnidirectional [15] vs directional [42] antenna radiation patterns).The generative model can be therefore used to reproduce the desired prior distribution in real-time, with sufficiently high randomness of samples.Generation rate is in the order of 50 ÷ 100 samples per second, which is reasonable considering typical body movement speeds (max. 1 m/s).
GNN model footprints.For the proposed implementations, Tab. 2 analyzes the size of the trainable parameters, namely the model footprint, of the decoder (C-VAE) and the generator (UC-GAN).Footprints range from 1 MB to 240 MB, being the EM field generation E θ more demanding in terms of memory occupation than excess attenuation A θ generation.Although out of the scope of the current paper, accurate model pruning is desirable to minimize the memory footprint on resource-constrained devices [5].

V. ANALYSIS OF EM FIELD GENERATION ACCURACY
In this section, we assess the ability of C-VAE and UC-GAN approaches to reproduce the EM diffraction effects (Section III) and, more generally, the effectiveness of the models in sampling from the prior p(E θ |θ k ) in (3).We consider the problem of reproducing attenuations (A θ ) and CSI (C θ ), separately, for varying input features and scenarios.
The setup consists of TX and RX nodes equipped with ULAs having 9 omni-directional antennas (corresponding to L = 81 total links) that are spaced at a = λ/2.The length of the central link of the array is equal to d = 4 m while all the links of the array are horizontally placed at height h = 0.99 m from the ground.The human target has also variable height h S , traversal max.and min.sizes equal to w S,1 and w S,2 , respectively.Generative models C-VAE and UC-GAN are trained using samples of EM body diffraction at carrier frequency f c = 2.4 GHz.The training samples correspond to the following settings of the body configurations Notice that these limitations are reasonable as far as the goal is to represent a human body [16].
C-VAE method requires parameter optimization, namely optimization of the number of latent variables Z and ELBO weights β, which is the goal of the first part of the analysis in Section V-A.Next, in Section V-B C-VAE and UC-GAN reproduction accuracy is compared considering a multiple antenna array configuration.Finally, in Section V-C we evaluate the accuracy of the generated EM field samples for reproducing the array response.Also, it is changing its orientation while standing in each marked location.Fig. 4b) shows the corresponding generated samples now featuring a subject moving across the LOS (−0.5 m ≤ y ≤ 0.5 m, x = 1 m) and with same dimensions.Finally, in Fig. 4c), the target is now fixed in position p = [0.5 m, 0] but uniformly changing its orientation ϕ from ϕ = −π/2 to ϕ = 0. Generated samples are compared with the average EM body excess attenuations A 1,θ obtained from ( 6) and ( 7) via numerical methods for the same link and corresponding positions (dashed lines).The excess attenuations are averaged over 50 random target movements in an elementary squared area of size = 0.1 m surrounding the corresponding marked positions p k .Considering all the tests, we found that using Z = 16 latent variables constitutes a good compromise between complexity and accuracy.

A. C-VAE LATENT VARIABLE OPTIMIZATION
In addition to average excess attenuation terms,  .As evident from the corresponding cases, the number of latent variables Z substantially affects the generated samples, while the ELBO weight β seems to have less evident effects.The C-VAE tool configured for Z = 16 (and Z = 32) provides a good representation of the distribution of the excess attenuations when compared with the EM diffraction model one.On the other hand, the C-VAE model seems to better reproduce the excess attenuation samples corresponding to targets placed at some distance, i.e., 2 m from the TX, rather than close to TX, i.e., 0.5 m.The trend is particularly evident when the C-VAE model is set to reproduce sample distribution with high variance and few training samples, as in the case for small target size (h S = 1.4 m, w S,1 = 0.35 m).Finally, the choice for β = 0.05 stands as a good compromise between the average reproduction accuracy and the reconstruction of the entire probability function which require to increase the randomness of generated samples [24].

B. C-VAE AND UC-GAN MODEL COMPARISON
In Fig. 6 Tab. 3 reports a comparative analysis of C-VAE and UC-GAN generation for the same MIMO setup, in terms of Mean Squared Error (MSE) and Kullback-Leibler (KL) divergence D KL [47].The latter compares the distance (divergence)   be to augment the size of training data for these more disadvantageous cases.

C. ANALYSIS OF GENERATED BEAMFORMING ARRAY RESPONSES
Based on the analysis in Section III-C, we now verify the ability of the proposed C-VAE model to reproduce the array response of a conventional linear beamforming processing.We thus compare the reproduced response R θ (γ | C VAE θ ) using the generated EM field samples C VAE θ with the true response R θ (γ |C θ ) obtained by diffraction as in (11).

A. GENERATION OF RECEIVED SIGNAL STRENGTH SAMPLES
So far we considered generative models to reproduce humanbody blockage effects E θ .In what follows, we highlight a dual use of the tool to reproduce the raw RSS measurements S t = [S ,t ] L =1 for an assigned body state θ k and link set L as in the scenario of Fig. 8.The RSS measurements S ,t are generated via Monte Carlo sampling method and using the C-VAE tool as follows: Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.where A VAE ,θ ∼ p VAE gen (A ,θ |θ k ), P ,0 is the free-space received power, known or measured during calibration, while w ,0 and w ,T model the log-normal multipath fading and the other noise sources.Disturbances w ,0 and w ,T differ depending on whether the target is located inside (p k ∈ F ) or outside (p k / ∈ F ) the first Fresnel zone F of the considered link .Noise w ,0 ∼ N (0, σ 2 0 ) refers to the free-space case with p k / ∈ F , while noise w ,T ∼ N (μ T , σ 2 T ) refers to the case with p k ∈ F .In the followings, we set μ T = 1.5dB, σ T = 1dB and σ 0 = 1dB [16].
Fig. 9 compares the generated RSS samples S ,t with RSS measurements S ,t at 2.4GHz for the scenario in Fig. 8    2 standard deviation values are shown as well.The generative model can be effectively used to predict the true RSS values for both the considered links with average error of 2.7dB and max standard deviation error of 2.9dB.On the other hand, as also observed in Section V-A, the C-VAE generation tool seems to over-estimate (ε < 0) the observed RSS values for target positions close to the transmitter or receiver, i.e., x = 3 m, since it is trained using diffraction models [15].

B. PASSIVE LOCALIZATION USING RF GENERATED SAMPLES
In this section, we discuss an example of passive localization.The goal is to detect the distance d R of the target from the multi-antenna RX device in real-time, as in typical radar applications.Given the RSS observations S t over the same links considered in Fig. 8, we want to recover the estimated target state θ k , in our case the position p k of the target relative to RX. 4 The proposed use case is critical in industrial scenarios where human workers operate in areas featuring increasing level of safety or privacy.Enforcing safety/privacy constraints requires the real-time monitoring and tracking of the human subject.
The estimated position p k of the target relative to RX is obtained as in (2), replacing θ with p k .Next, the target distance is derived as d R = | p k |.The C-VAE generated attenuations are here used to reproduce the prior distribution.The problem simplifies to p k = arg max k {p A (θ k )}: using simple Bayes rule considerations and eq.( 2), the probabilities p A are defined as: Tab. 5 analyzes the precision and recall probabilities: with L(d R ) being the region that contains the positions p k of the target at distance of d R from the RX.The recall metric measures how often the algorithm correctly identifies the target distance from all the true positive counts, while the precision indicates how often the algorithm is correct when predicting the target distance.The table analyzes the precision and the recall for varying distance d R from the RX.Three cases are considered: i) estimated prior scenario: assumes the prior probability p gen (A ,θ |θ k ) being estimated from calibration measurements; ii) C-VAE prior: adopts the probability p VAE gen (A ,θ |θ k ) as prior model with samples obtained using the C-VAE generator tool; iii) uniform prior: represents a case where no information on excess attenuation is available: the prior is replaced with a uniform probability function U −5dB,15dB with attenuations ranging from −5dB to 15dB.
Note that scenario i) gives the best performance, as expected; on the other hand, it requires time-consuming data collection and a calibration stage which might be not feasible in practice.Case iii) corresponds to the worst case scenario since no prior information on body-induced attenuations are available.Finally, case ii) does not need any calibration as it uses the C-VAE tool to real-time generate samples from the prior p VAE gen (A ,θ |θ k ).From the results in the Tab. 5, the performance of C-VAE prior scenario approaches the estimated prior case, with an average drop of about 10%.

VII. CONCLUSIONS AND FUTURE ACTIVITIES
The paper proposed the use of EM-informed Generative Neural Network (GNN) models to predict body-induced diffraction effects.Explored applications are in the field of passive radio sensing and localization.A Variational Auto-Encoder (VAE) tool, namely the Conditional VAE (C-VAE), is designed to generate samples of the targeted EM model through latent variable encoding/decoding neural network operations.The tool reproduces EM field samples corresponding to specific human body states that are user selectable and modifiable in real-time during model exploitation.Adaptations of Generative Adversarial Networks (GAN) are also considered for comparative analysis.Generated samples are set to reproduce both RF signal attenuations, i.e., Received Signal Strength (RSS), as well as base-band Channel State Information (CSI) for full EM analysis of human body blockage.
GNNs produce observations sampled from the Bayesian prior probability which supports Bayesian estimation problems.Examples tailored for passive localization reveal the possibility of optimizing the generation process so to limit the use of time-consuming calibration stages and intensive EM computations.Generated samples might also serve as synthetic training data for supervised or semi-supervised machine learning tasks.They thus reduce the need of personal data collection, that could be used maliciously for person (re-)identification.
Beside the advantages, the proposed generators require hyper-parameter optimization to achieve a good compromise between average reproduction accuracy and generalization capability.The former measures how close the prediction is to the training observations, the latter quantifies the randomness of generated samples which is useful to predict effects not seen during training.When compared with real measurements, the generated tools appear to underestimate some human blockage effects, i.e., for small targets placed close to the receivers.This opens the room for possible improvements.The considered generative systems are currently trained to reproduce scalar diffraction effects.2D EM absorbing sheets are also used to model 3D human bodies.Training the system with different EM blockage models, such as full-wave solutions or Method of Moments (MoM), and/or using more accurate body models, might increase the generalization capabilities.
Although still in their infancy, we expect physics-informed GNN models to become indispensable tools for designers in different scenarios.For example, future radio sensing tools will be paired with accurate EM modeling in high frequency bands as proposed in emerging wireless communication standards (6G and beyond).The possibility of generating large CSI tensor structures representing the full RF radiation field is also useful in emerging holographic methods and microwave imaging techniques based on Synthetic Aperture Radar (SAR).Finally, the proposed tools have been proved to be effective in reproducing body motions in user selectable locations.The property is instrumental to privacy selective sensing policies.

FIGURE 2 .Fig. 1 .
FIGURE 2. 2D layout of the radio link assuming a multiple antenna configuration of Fig. 1.Extension to a MIMO set-up is straightforward.The point O is the projection of the barycenter P of the target S over the -th LoS path having length d .

FIGURE 3 .
FIGURE 3. a) Conditional VAE (C-VAE) and b) Unbalanced Conditional GAN (UC-GAN) architectures for generating EM body model samples; c) C-VAE encoder and decoder neural network structures for generating excess attenuations Aθ and EM field responses Eθ ; d) corresponding discriminator and generator structures for UC-GAN.Dense, Conv and Conv T refers to fully connected, convolution and deconvolution layer [48] operators, respectively.

TABLE 1 .
C-VAE and UC-GAN vs diffraction model: body-induced excess attenuation and EM field generation time analysis.spatial dimensions of intermediate features, so that generated outputs respect the desired dimensions.Considering now the UC-GAN model in Figs.3b) and 3d), the discriminator (W I ) and the generation (W G ) model structures include similar components.Following the unbalanced GAN implementation, the C-VAE decoder model parameters W D are transferred to discriminator W I at the beginning of the training stage.To simplify comparison, the outputs of both the models have the same dimension.For an assigned input θ k = [p k , h S , w S,1 , w S,2 ] VAE and GAN generate 201 different subject orientations ϕ k in the interval −π/2 ≤ ϕ k ≤ π/2, for all the configured physical links 3 L.

Fig. 4
Fig.4shows an example of C-VAE generation of diffraction model samples, namely body-induced excess attenuations, using varying latent variables ranging from Z = 8 to Z = 32 and ELBO weight β = 0.05.The subject is moving along and across the (single) radio link in specific marked locations p k , as well as changing its orientation 0 ≤ ϕ ≤ −π/2.TX and RX are equipped with a single antenna (L = 1).Here, we are interested in generating body excess attenuationsA VAE θ = [ A VAE 1,θ ].To account for the uncertainties introduced by different body postures and small, i.e., involuntary, movements in the assigned location p k , we report the average excess attenuations w.r.t.50 generated samples fromA VAE 1,θ ∼ p VAE gen (A θ |θ k ).In Fig.4a), the C-VAE model is used to reproduce the average excess attenuation samples corresponding to a subject that is moving along the LOS (0.25 m ≤ x ≤ 3.75 m, y = 0) with a step of 0.25 m, namely occupying 15 marked locations, from p 1 = [0.25 m, 0] to p 15 = [3.75m, 0].The target has different dimensions, namely h S = 1.4 m, w S,1 = 0.35 m (black), h S = 2.0 m, w S,1 = 0.65 m (red), h S = 1.65 m, w S,1 = 0.65 m (green) and w S,2 = 0.25 m.Also, it is changing its orientation while standing in each marked location.Fig.4b) shows the corresponding generated samples now featuring a subject moving across the LOS (−0.5 m ≤ y ≤ 0.5 m, x = 1 m) and with same dimensions.Finally, in Fig.4c), the target is now fixed in position p = [0.5 m, 0] but uniformly changing its orientation ϕ from ϕ = −π/2 to ϕ = 0. Generated samples are compared with the average EM body excess attenuations A 1,θ obtained from (6) and (7) via numerical methods for the same link and corresponding positions (dashed lines).The excess attenuations are averaged over 50 random target movements in an elementary squared area of size = 0.1 m surrounding the corresponding marked positions p k .Considering all the tests, we found that using Z = 16 latent variables constitutes a good compromise between complexity and accuracy.In addition to average excess attenuation terms, Fig. 5 analyzes the distribution of the generated excess attenuation samples p VAE gen (A θ |θ k ) compared with those obtained from the EM diffraction model.Prior distributions are obtained for varying latent variables Z and two choices of ELBO weight β, namely β = 0.05, for Figs.5a) and 5c); and β = 1e−09, for Figs.5b) and 5d).Three target configurations are considered, namely h S = 1.4 m, w S,1 = 0.35 m in Figs.5a), 5b), 5c), and 5d); h S = 1.65 m, w S,1 = Fig. 5 analyzes the distribution of the generated excess attenuation samples p VAE gen (A θ |θ k ) compared with those obtained from the EM diffraction model.Prior distributions are obtained for varying latent variables Z and two choices of ELBO weight β, namely β = 0.05, for Figs.5a) and 5c); and β = 1e−09, for Figs.5b) and 5d).Three target configurations are considered, namely h S = 1.4 m, w S,1 = 0.35 m in Figs.5a), 5b), 5c), and 5d); h S = 1.65 m, w S,1 =

FIGURE 4 .
FIGURE 4. C-VAE generation of body-induced excess attenuation values A ,θ for different target movements (along, across the LOS and varying orientations) and dimensions (hS ,wS,1), with wS,2 = 0.25 m.C-VAE results are shown for varying latent samples Z and β = 0.05.From left to right: a) the subject is moving along the LOS (0.25 m ≤ x ≤ 3.75 m, y = 0).The generated EM body excess attenuation values obtained via numerical methods are represented in dashed lines by averaging over random target orientations −π/2 ≤ ϕ ≤ π/2 and random movements in an elementary squared area of size = 0.1 m. b) The subject is moving across the LOS (x = 1 m, −0.5 m ≤ y ≤ 0.5 m).Dashed lines shows the corresponding ground-truth diffraction model samples obtained similarly as in a).c) The target is in position x = 0.75 m, y = 0 and changing orientation −π/2 ≤ ϕ ≤ 0 while performing small movements in the same elementary area.Dashed lines shows the EM body model excess attenuations obtained for a subset of the subject orientations.Figs.a) and c) are also presented in [24]

FIGURE 5 .
FIGURE 5. C-VAE generated sample probabilities p VAE gen (Aθ |θk ) of body-induced RF excess attenuations for varying latent dimensions (Z = 8, 16, 32) and ELBO weights β, compared with samples obtained from EM body model (dashed lines).In a) and b) the target stands at x = 0.5 m from the TX with size hS = 1.4 m, wS,1 = 0.35 m.Generation exploits β = 0.05 in a) and β = 1e − 09 in b).In c) and d) the target stands at x = 2 m from the TX with size hS = 1.4 m,wS,1 = 0.35 m while β is set to β = 0.05 in c) and β = 1e − 09 in d), respectively.The target, with size hS = 1.65 m,wS,1 = 0.65 m, stands at distance from the TX equal to x = 0.5 m in e) and at x = 2 m in f).The target with size hS = 2.0 m,wS,1 = 0.65 m stands at distance from the TX equal to x = 0.5 m in g) and at x = 2 m in h).For the cases in e), f), g) and h), it is β = 0.05.

FIGURE 6 .
FIGURE 6. C-VAE vs UC-GAN generation of EM body model for a MIMO array consisting of 3 antennas at the TX and RX, respectively, and L = 9 radio links.The subject has dimensions (hS = 2 m,wS,1 = 0.55 m, wS,2 = 0.25 m) and is moving along the LOS path of link = 5 (0.25 m ≤ x ≤ 3.75 m, y = 0).The EM body average excess attenuation values Aθ = [A ,θ ] L=9 =1 obtained through C-VAE (solid lines) and UC-GAN (diamond markers) methods are compared with the corresponding diffraction model samples (dashed lines).C-VAE latent variable dimension is Z = 16 with β = 0.05, as optimized as in Fig. 4. UC-GAN is pre-trained using C-VAE decoder model.
, we compare the C-VAE generation tool using the optimized parameters Z = 16, β = 0.05 shown previously with the UC-GAN implementation described in Section IV-B.The following analysis is of interest as it shows the behavior of two different generative systems and compares their ability to reproduce the EM body diffraction effects.With respect to previous section, we now consider a MIMO ULA setup consisting of 3 antennas at the TX and RX, respectively, L = 9 radio links and distance d = 4 m.The samples A VAE θ = [ A VAE ,θ ] L=9 =1 obtained through C-VAE (solid lines) and with UC-GAN A GAN θ = [ A GAN ,θ ] L=9 =1 (diamond markers) tools are compared with the corresponding EM body-induced excess attenuations A θ = [A ,θ ] L=9 =1 obtained from diffraction theory (dashed lines).The subject has fixed dimensions h S = 2 m, w S,1 = 0.55 m, w S,2 = 0.25 m, and it is moving along the LOS path of the link = 5 (0.25 m ≤ x ≤ 3.75 m, y = 0).
among the generated sample probability functions, p VAE gen or p GAN gen and the theoretical ones obtained from EM diffraction.The MSE and the KL divergence terms are computed for link = 5 and varying target dimensions.The MSE values remain below 0.5dB for C-VAE, on the other hand they are about 1dB higher for UC-GAN.Similarly, by observing the KL divergence, the C-VAE model is able to better reproduce the true distribution of the samples (KL divergence < 1) compared with UC-GAN, which features in some cases large deviations (> 3).As previously noticed, the C-VAE model seems to better reproduce the diffraction samples for targets placed at distance x > 0.5 m from the TX.Also, large targets surfaces S (h S ≥ 1.8 m) are better represented by C-VAE model than small surfaces (h S ≤ 1.6 m).Even if not considered in this paper, a possible solution could

Fig. 7
considers an RX-side UL array layout now consisting of L = 9 antennas (see Fig. 7a)) and shows the body-induced array response 20 log 10 |R θ (γ )| as a function of the DoA γ and for different values of the y displacement of the target (w.r.t. the central LOS) and x = 2 m.The array signal processing is set to extract the response for varying DoA γ and is based on Fast Fourier Transform (FFT) with N FFT = 257 points.Fig. 7b) shows two responses (red and green lines) for corresponding target locations y = −0.25 m and y = 0.25 m, respectively.The theoretical responses R θ (γ |C θ ) using EM diffraction are in dashed lines while solid lines represent the reproduced responses R θ (γ | C VAE θ ) using C-VAE generated CSI samples C VAE θ .Fig. 7c) compares the maximum response γ max of the array as defined in (13).Blue dots are the dominant DoA obtained by maximizing the array response R θ (γ |C θ ).Red dots refer to the DoA produced by C-VAE generated response R θ (γ | C (VAE) θ).Both cases simulate a target moving across the LOS (−0.25 m ≤ y ≤ 0.25 m, x = 2 m), with speed 0.5 m/s and changing orientation randomly in the interval 0 ≤ ϕ ≤ π/2.It can be noticed that the maximum response γ max is perturbed by the presence of the subject and such alteration is well reproduced by the C-VAE model.
VI. CASE STUDIES IN PASSIVE RADIO LOCALIZATIONA specific case study is considered in this section.The goal is to demonstrate the effectiveness of the proposed GNN tools to reproduce the prior p(E θ |θ k ) in (3) and to support a real-time localization process.The EM-informed C-VAE tool has been thus validated with measurements taken in a hall of size 6.1 m × 14.4 m as shown in Fig. 8.Both TX and RX are equipped with directional antennas with parameters summarized in the table embedded in Fig. 8.A mechanical handling mechanism, shown in the top left of Fig. 8, is used to move the RX antenna at specified positions where RF measurements on multiple links are collected.The target is located in K = 75 marked positions p k , k = 1, . . ., K, which belong to a regular 2D grid as shown in the same figure.A tracking generator enabled spectrum analyzer [49] is used to measure the RSS S t in the 2.4 ÷ 2.5GHz band and over 81 frequencies with 1.25MHz spacing.For each frequency and target position under test, 500 consecutive time samples are acquired in 1 min.(120 ms sampling time).The human target (one of the authors who volunteered) has height h S = 1.80 m and traversal max.and min.sizes approximated as w S,1 = 0.55 m and w S,2 = 0.25 m, respectively.

FIGURE 8 .
FIGURE 8. Measurement setup, TX/RX antennas, linear guide system for RX antenna positioning.Explored target positions around the Fresnel's area.

and link = 2 .
The subject is standing while performing small movements around 4 nominal positions p k = (x, y), namely x = 0.25 m, y = 0 (blue color), x = 0.5 m, y = 0 (violet color), x = 0.75 m, y = 0 (yellow color).Position outside the Fresnel area is x = 2 m, y = −0.5 (black color).Considering the same scenario, Fig. 10 evaluates the C-VAE generation of RSS samples obtained from two RX locations, namely the links = 1 (red) and = 2 (green), corresponding to a target now moving along the LOS 0.25 m ≤ x ≤ 3.75 m (y = 0).The generated samples are again compared with RSS measurements where the target is set to move along the LOS path with a constant speed approximated as 0.25 m/s.The average error ε (p k ) = E t [S ,t (p k )] − E t [ S ,t (p k )] between the RSS values S ,t (p k ) predicted by the C-VAE model and the corresponding measurements S ,t (p k ) are summarized in Tab. 4 for varying target locations p k along the LOS path.The corresponding error δ

FIGURE 10 .
FIGURE 10.C-VAE generation of RSS samples observed for links = 1 (red) and = 2 (green) described in Fig. 8 and corresponding to a target moving along the LOS 0.25 m ≤ x ≤ 3.75 m (y = 0).Generated samples are compared with RSS measurements at 2.4GHz.Both the generated samples and the corresponding true measurements have Confidence Interval (CI) of 60%.TABLE 4. Average ε and standard deviation δ error analysis between true RSS and predicted via C-VAE generative model (setup in Fig. 8).