Deep-Learning-Based ModCod Predictor for Satellite Channels

. One of the significant challenges for satellite communications is to serve the ever-increasing demand for the use of finite resources. One option is to increase channel utilization, i.e., to transmit as much data as possible in a given frequency range. Since the channel is highly variable, primarily due to the ionosphere and troposphere, this goal can only be achieved by adaptively varying modulation and coding schemes. Most procedures and algorithms estimate the channel characteristics and descriptive quantities (e.g., signal-to-noise ratio). Ultimately, these procedures solve a regression problem. The resulting quantity is used as the basis for a decision process. Since valuation can also be subject to error, the decision mechanisms based on it must compensate and mitigate this error. The main element of the current research is to combine these two steps and solve them together using deep neural networks. The theoretical advantages of the method include that a better result can be achieved by having a joint estimation and decision process with a standard algorithm and cost function. The theoretical approach was tested with an actual protocol – Digital Video Broadcasting - Satellite - Second Generation – where we observed a significant improvement in channel utilization on previously recorded Alphasat satellite data.


Introduction
Satellite-based communication (SATCOM) is as old as the first satellites.Without being exhaustive, it plays an essential role in radio broadcasting, weather forecasting, maritime communications, assisted navigation, and military operations areas [1].Today, satellite communications are used in a wide range of applications.They can be extended and expanded with new special applications (such as drone to satellite connection, possible 6G usage [2] etc.).Current trends suggest that this area is only set to grow.There are various traffic volume estimates regarding data traffic and monetary value.This is because the quantities involved are difficult to estimate, and there are technological, economic, political, and legal reasons behind them.A report by Market Research Future Satellite (MRFR): Communication Market Worth USD 41,860 Million by 2025 at 8.40% Compound Annual Growth Rate; this comprehensive research in SAT-COM shows that the area is expected to see sustained and significant growth in all areas [3].In addition to this, several new services are also being developed by companies that are not necessarily inherently space-related services [4], [5].In other words, more and more companies, universities, and other organizations are putting satellites on the market to meet the growing demand and carry out related research.
One solution to mitigate the increasing traffic demands may be adapting the modulation and coding schemes (ACM) used to exploit the channel's capabilities.Many existing protocols support adaptive techniques (such as DVB-S2 [6] aka Digital Video Broadcasting -Satellite -Second Generation), so from a practical point of view, the issue of how to use them needs further research.The key to its implementation is to know and understand the physical parameters of the channel as well as possible.The most important difference from all other above-ground direct wireless links is that ground-tosatellite links pass through the Earth's entire atmosphere.Regarding electromagnetic signal propagation, two atmospheric layers significantly affect propagation: the troposphere and the ionosphere [7].This causes a dynamic change in the attenuation of the medium, which can be divided into scintillations and fading based on their time spread.The former is a rapidly changing process, while the latter is a longer-lasting event, often causing attenuation with larger amplitudes.SAT-COM applications often use millimeter waves (mmWave), which are susceptible to tropospheric effects.
In summary, we either use additional satellites or implement more efficient transmission.In the second case, it is also possible that better ACM techniques could be applied to existing devices, so it is worth focusing on this research.DOI: 10.13164/re.2024.0182 It would be best to use procedures that do not require additional measurements or signals and can be calculated quickly in real-time.There are few solutions of this kind whose results are publicly available, both because of their innovative nature and for the reasons explained above.
With this in the background, the current research focuses on how artificial intelligence, specifically deep neural networks, can predict the best setting for the upcoming time step.By best settings, we mean selecting an ACM state that maximizes channel utilization with a predefined error rate.This was done using historical data collected by an actual working satellite, the Alphasat [8].The main question of the present research is whether it is possible to develop a Mod-Cod (modulation and coding settings) state (basically, it is the ACM) estimation procedure and under what conditions.As a secondary question, we investigate whether the process is feasible if we do not use the data channel signal for control.The DVB-S2 broadcasts from the Alphasat satellite were used as data for the research.Using the capabilities of this protocol.
The structure of the paper is the following: In Sec. 2. The main features of the radio channel are presented, with a particular focus on the case of mmWaves.Afterward, in Sec.3., our procedure, the signals, and the methods are presented.In Sec. 4, we detail our results.Finally, Section 5 is the conclusion.

Most Important Properties of Satellite Communications
Satellite-to-Earth links differ from other wireless communication methods in that the channel passes through the Earth's atmosphere; in addition to all, it is a direct link.With such a long open (unprotected) medium, transmission is sensitive to natural disturbances and, in some cases, to intentional disturbances [9].Some of these problems also affect the physical layer, such as jamming.This can reduce the channel's capacity and possibly make the connection completely unavailable.In the current research, such intentional adverse externalities have not been considered.Unintended attenuation can be taken into account by considering the following categories during the design of satellite links: 1. Satellite Link Budget: The main static parameters of the link [10].A well-designed link budget ensures that there is sufficient signal strength at the receiver to maintain reliable communication.More details are given in Sec.2.1.

Satellite Elevation and Visibility:
The physical location of the satellite used in the link and the ground station to be used and the resulting problems to be solved [10].More details are given in Sec.2.1.

Interference:
Interference from other sources, such as other satellites, terrestrial transmitters, or unintentional radio frequency emissions, can degrade the quality of satellite communication.The minimization can be done primarily by a good choice of the GS environment, the use of a narrow beam, and the use of communication protocols [10].

Ionospheric Effects:
The effect of ions on electromagnetic waves, usually the source of scintillations [10], [11].More details are given in Sec.2.2.

Atmospheric Attenuation:
The impact of atmospheric gases.This determines the frequencies that can be appropriately selected [10], [12].More details are given in Sec.2.2.
Every system has an inertia arising from the transmission time.Depending on the satellite orbit, the signal path length varies, and the execution of every control message takes a different amount of time.In addition, both ground station (GS) and satellite data processing have a time requirement, which also delays the implementation of various controls.

Satellite Link Budget and Satellite Elevation and Visibility
The two concepts are not precisely the same and contain different elements, but they are closely related to preparing a link budget.A link budget calculates the overall gain and loss of a communication link between a transmitter and a receiver [14].The quantities used basically scale to an expected average expected signal level, where a buffer is made for dynamically varying disturbances.The primary magnitude attenuation is the distance, with the (vacuum) outdoor attenuation providing the base value.The link budget of the DVB-S2 data used in this research for GS in Budapest is shown in Tab. 1.
The elevation angle of the satellite from the receiver's location plays a crucial role in the quality of communication.Low satellite elevations may introduce higher atmospheric path loss and obstructions like buildings and terrain.Additionally, satellite visibility can be blocked by obstacles like buildings or geographical features, leading to signal blockages.Because Alphasat is located in geosynchronous orbit, the GS must also track the satellite so that the beam of the directional antenna can see it properly.A few tenths of degrees of deviation will cause several dB of extra attenuation (antenna pointing errors).Of course, for satellites in geostationary orbit, this is not a problem [8].It is understood that the set of parameters presented in this chapter is always scaled to the need element, i.e., the average amount of data to be transferred.

Earth and its Atmosphere
The Earth's atmosphere influences radio frequency (RF) signal transmission as we increase the frequency.The propagation of RF signals through free space is affected by atmospheric attenuation, primarily caused by the absorption of signals by gases like oxygen and water vapour.At frequencies below 10 GHz, the impact of signal absorption is relatively low and can be predicted.However, as we venture into higher frequencies, particularly in the mmWave range (30 to 300 GHz), this attenuation significantly increases, especially at specific frequencies.The attenuation increases and becomes more dependent on the absorbing properties of water vapour, oxygen, and other gases [10], [12].The attached figure (Fig. 1) shows the average attenuation values at two different altitudes.In addition, the absorption peaks for each gases are plotted.In addition, the two layers of the Earth's atmosphere need to be highlighted separately, because the processes in them are dynamically changing, and so is the attenuation of mmWaves.: the troposphere, which is the lowest layer, and the ionosphere, which is the upper layer but has no exact lower boundary [7].In the millimeter wavebands, tropospheric effects are the most dominant in terms of attenuation; weather phenomena occur here [11], [13].As the attenuation varies, the physical communication channel, with a constant radiated power, does not have a constant capacity [10].
The ionosphere, an ionized region of the Earth's upper atmosphere, can introduce signal delays, phase distortions, and scintillation.These ionospheric effects vary depending on factors such as solar activity, time of day, and the satellite's position [11].For practical use, 1 GHz is considered the limit above which ionospheric effects will be negligible [7]; it should only be taken into account to a small extent when calculating the link budget.
The most variable attenuations in the atmosphere are caused by the weather [10,13,15].In these cases, the changes in the medium are mainly caused by water and its variants (hydro-meteors).Generally speaking, this is often the reason behind fading.Special attention is paid to rain fade, the most acute form of manifestation.The intensity of the rain, the average droplet size, the velocity, and the polarization of the antenna used also affect the attenuation that occurs [12].It can be said that the prediction and compensation of this effect is one of the problems to be solved for the successful application of ACM techniques in satellite communications [15].
The impact of tropospheric phenomena, i.e., weather, can be measured to a moderate extent.In most cases, the connection is slant (inclined orbit satellite).This means that the line of connection forms an acute angle with the surface of the Earth (at the point of the ground station [10], [15].Therefore, what is essential is not only the weather that can be measured locally but also the weather that can be experienced further away from the station.In summary, the external observation of the phenomenon that causes the most enormous and most persistent attenuation change, thereby reducing the channel capacity, is the most complicated.For entirely accurate weather forecasting, a pervasive weather measurement system is required.

ACM Method Based on ModCod Prediction
A prediction refers to a statement or estimation about a future event or outcome based on available information, data, or analysis.For the present procedure, the question is: what to predict: some quantity and set the ModCod settings based on that, or the appropriate setting itself [15]?Most of the ACM procedures in the literature [16][17][18][19][20][21][22] estimate either SNR (signal-to-noise ratio) or some other channel characteristic quantity, which can then be used to select the appro-priate settings for the overlay protocols.Various procedures and approaches similar to these already exist for the DVB-S2 protocol [23][24][25][26].
However, there are some previous studies where modern machine learning-based solutions have been applied for modulation selection.In a study [27], the authors introduced a Convolutional Neural Network [28] based approach for modulation classification in communication systems.The primary objective was to identify the modulation scheme employed in a signal based on the N-sample received vector.The study considered four modulation schemes: quadrature phase-shift keying QPSK, 8PSK, 16 quadrature amplitude modulation 16QAM, and 64QAM, formulating the task as a classification problem.The proposed model aimed to outperform existing digital cumulants and Support Vector Machine based algorithms.Evaluation involved a comparative analysis of the three methods.The CNN-based model eliminated the need for manual feature classification, with input data in complex samples converted to constellation diagrams in JPEG format.CaffeNet [29], a variant of the AlexNet [30] model, was utilized for training, testing, simulation, and result recording.The cumulant-based approach demonstrated higher accuracy in the high (SNR) region (> 5 dB), while the SVM-based approach performed better in the low SNR region.Notably, the proposed method exhibited comparable performance to the cumulant-based method in the high SNR region and to the SVM-based method in the low SNR region.In the high SNR region, the accuracy for all modulation types approached 100%, and the proposed approach achieved error-free classification.
In a study presented in reference [16], the authors propose an adaptive modulation scheme based on reinforcement learning, employing a DNN to enhance the spectral efficiency of wireless communication.The solution aims to utilize the average network exploration strategy (AE) to enhance the exploration efficiency of the agent.The reinforcement learning problem is addressed using the Q-learning algorithm.Cognitive radio is capable of selecting actions, observing their effects, and discovering new states and reward functions.The modulation schemes considered are binary phase-shift keying BPSK, quadrature phase-shift keying QPSK, 8PSK, and 16PSK.For each scheme, a rate region is chosen based on instantaneous SNR.The study specifies the bit error rate (BER) approximation function and the loss function of the Reinforcement Learning Neural Network.Performance evaluation involves a four-layered fully connected network structure, and simulation results are obtained and compared for AE and BER constraint strategies.It is observed that the AE strategy achieves maximum spectral efficiency in a few episodes compared to the BER strategy, with a faster convergence rate.As the average SNR increases, the maximum spectral efficiency obtained by AE surpasses that of the BER strategy.Such an adaptive modulation scheme proves beneficial for systems with limited bandwidth that need to transmit large amounts of data.
The main topic of the current research is how to combine estimation and decision into one step.To achieve this, it is necessary to incorporate the protocol, which in this case is DVB-S2 [6].With an appropriately sized net and reasonably fast hardware, this approach can produce faster results, even with more accurate results.

Theoretical Procedure
Classically, this happens at the very bottom of any ACM procedure.We measure some signals or signals (for simplicity, we do not mark the scalar and vector cases separately; they are essentially the same).Mark the measured signal(s) with an  meas ; Then the measurement error is  meas .The measured input signal is intended to be the input to the algorithms that select the ACM setting with the highest spectral efficiency (i.e., one still within the error margin).In most cases, the measurement error is typically normally distributed, while the signal to be measured is lognormal distributed.
Theoretically, we can measure continuously but process data and adjust ACM settings at discrete time intervals using digital systems.Mark these time instants with .The ACM setting chosen at the beginning of the time slot is valid for the whole time step.Figure 2 shows the general case described so far.For simplicity, we number each ACM configuration (essentially treating them as categories).The configuration with the lowest number corresponds to the lowest spectral efficiency in a given system.We then arrange the others in increasing order based on spectral efficiency.Denote   by the spectral efficiency of the  th modulation.In the possible options, it is important to underline two specific cases: never occur, e.g., "Category i+3"; or the probability of occurrence is insufficient.
The measurement can be expressed in the following form: The measured -for simplicity, scalar -signal is marked in green (arbitrary dimension).You can assign a category to it at any moment, which is, in fact, the ACM setting.We have marked the discretization of an arbitrary discretization with vertical dotted lines; this is, in fact, the discrete time of the example system.
In the case of classical methods, we first utilize the measured value to estimate the descriptive quantity (e.g., SNR).During the learning process in a given year, we teach this algorithm, denoted as  ().If we could achieve flawless training, we would obtain the relationship  =  (), where  is the corresponding descriptive quantity.However, algorithms obtained through machine learning are typically burdened with errors, which we model as f () =  () +  est () = x, where  est is the error of the estimator function.In other words, our correct result will be appropriate only with minimal, additive errors considered.Any procedure that does not approximate the correct result as an additive perturbation to a probability variable is unacceptable.We do not deal with algorithms of this nature.In the case of a mean square error criterion, neither the bias nor the variance is zero [28], [31].If our system is linear, even the initial distribution remains unchanged: Our ultimate goal is to estimate the modulation index denoted by , which will be the best.For this, an additional function (), following the previous logic, with an error denoted by  dec is required.Consolidating the above, we obtain the following process: Summarizing the above, an error term appears three times in the case of classical methods.Moreover, two of these terms appear as arguments in separate functions, so during the training processes, attention must also be paid to dampening and handling these.
In the current research, by merging the estimation and decision steps along the lines described earlier, we arrive at the following form: Marked explicitly with a subscript 2 to indicate that neither the algorithm error nor the estimated value is the same.In this case, however, the interpretation of the error is ambiguous.If we interpret the error in terms of ACM indices, its optimization may be straightforward, but the symmetric error criterion cannot be used in this case either.Ultimately, only the channel utilization matters: it is not beneficial either to have the capacity to transmit more without doing so or to employ less robust solutions more boldly.
Having fewer error terms and fewer independently optimized algorithms may enable a more accurate estimation.If the complexity of the DNN does not significantly increase the number of trainable parameters, the procedure essentially comes with advantages.In the current research, beyond examining neural networks with fewer than 5000 parameters, we have not investigated them in more detail.To assess efficiency, it is necessary to apply a metric, for which we use the following: where  is the spectral efficiency, in time step , while  is the best and  is the estimated ACM category.In the following, we present the feasibility of the theoretical DNNbased method outlined so far, comparing it with an existing solution.

The Used Satellite Connection
Alphasat is a sophisticated telecommunications satellite designed to provide advanced communication services across Europe, Africa, and the Middle East.Developed as a joint venture between the European Space Agency (ESA) and Inmarsat, Alphasat offers an array of cutting-edge features, including a high-capacity digital payload and an advanced onboard digital processor.The satellite operates in Q/V (Ka-band) frequencies, enabling enhanced broadband services [32].Our actual research is based on data traffic between this satellite and the Budapest ground station.
For the data transmission experiments, the original signal comes from the Graz GS, where it is transmitted to the satellite in V-band.The transponder transmits in Q-band, which the Budapest GS also receives.As shown in Fig. 3, two dedicated channels are applied for DVB-S2 experiments, transmitted at relatively close frequency bands with the same bandwidth.The payload signal, i.e., the data transmission (using the DVB-S2 protocol) used during our experiments in Budapest, is nominally at 38.1 GHz, also referred by the link budget calculation in Tab. 1, while the Q-band beacon signal, applied for our prediction, is transmitted at 39.402 GHz.The two signals are, by their nature, similar in their response to the modifying effects of weather [8], [15].
The prediction procedure is simple: the beacon signal is used to predict the best ModCod setting for DVB-S2 transmission, which is the option with the highest data throughput.

Main Features of the Two Signals
One month of data was used for the research, from 01.07.2017 to 31.07.2017, with a sampling period of 40 ms.In total, only a negligible amount of data is missing due to technical problems of the GS.The missing values were not replaced, but the values at the two edges were considered continuous.Given the long period, their occurrence is negligible.
To examine the signals, the DVB-S2 modulation and encoding settings possible during data transmission are of interest first, as this limits the total set of possible settings depending on the link budget, as shown in Tab. 2. Obviously, the larger the fade margin of a setting, the more robust it is.In return, it can transfer less valuable data per unit of time due to lower spreader efficiency.In case the transmission does not reach the minimum expected fade margin level only, it is expected that there will not be any data transmission.
Due to computational capacity constraints, it is (and in many other cases may be) necessary to "window" a few consecutive packets that are transmitted with the same configuration.In this case, a real-time test has not yet been performed, only a simulation with historical data.However, time must be allowed for the prediction to be completed, transmitted to the transmitter GS via the Internet, and even for the satellite to know that the data is coming in with a different ModCod setting.To select the right option, it is necessary to categorize the data in some way.Theoretically, this requires  s / 0 for all symbols, but for a window, a first approximation, hopefully roughly constant, SNR value might be sufficient.Thus, creating each of the categories that can occur (i.e., classifying the different SNRs into a ModCod setting), as shown in Tab. 3. In the "no signal" state (class, category), the most robust solution should be chosen, which is, in many respects, but not necessarily, the same as the first setting.The reason for the distinction is to know why we applied what we did, with what expected throughput when applying metrics.
The SNR values were extracted using the GNU Radio built-in MPSK Estimator [33] from the DVB-S2 messages.This method is based on the M2M4 estimator [34].In some cases, the results obtained were presumably incorrect, corrected by manual labeling.
There are several reasons behind the use of the beacon signal.On the one hand, it provides an opportunity to test and develop methods to control data transmission based on what happens in another frequency band.In the present case, the physical limitations of the satellite make the two frequency bands so close.On the other hand, generating a constant beacon signal is relatively simple.It can be kept highly accurate so that any changes are mainly due to changes in the medium.Beyond this, it also allows, on a theoretical level, for studies where the user data traffic is not at a constant carrier power (however, this is not explored now in more detail).
During the period under investigation, the minimum power of the received signal was −112.6448dBm, while the maximum was −51.2841 dBm.The beacon signal's cumulative distribution function (CDF) is shown in Fig. 4. The result follows a mostly normal distribution, which aligns with prior expectations.
Looking at the high cross-correlation between output and input, we get a high value through hundreds of lags.The same is true for the autocorrelation of each quantity.We find a divergence in the partial correlation, where the input is only significantly different from zero by about 16 values.This suggests a robust linear relationship between the input and the output, making the task easy to perform.Given that the partial correlation of the input is only significant up to 16 lags, we can use the approach that we are dealing with a kind of moving average problem.Tab. 3. The bounds of the settings in the classes (values in the limit case would always be assigned to the upper category), with their occurrence rate.The measurement period was heavily loaded with rain fading, beyond which the maximum SNR was 5.897 dB.

Fig. 4.
Cumulative distribution function of the beacon signal received power, calculated by the measured value.It is a mathematical function that provides the probability that a random variable takes on a value less than or equal to a given value.In this case, the expected beacon signal value is the random variable.

The Deep Neural Network Method
A Deep Neural Network (DNN) is a type of Artificial Neural Network (ANN) that consists of multiple layers of interconnected neurons, allowing it to learn and extract hierarchical representations of data.It is called "deep" because it typically has more than two hidden layers, enabling it to model complex relationships and capture intricate patterns in the input data [28,31,35].
For DVB-S2, research results show that it is possible to obtain better estimates than conventional methods using NN with memory [24].So, the real question is whether we can get a better result by combining the estimation and decision components.In other words, is it better to treat the problem as a classification than the previous procedures.
The problem to be solved is a supervised machinelearning classification of the existing data and the expected functionality [28,31,35].In doing so, the individual Mod-Cod settings will be the classes (see Tab. 3) the structure should predict based on the input (the received beacon signal).In addition to investigating whether the problem can be solved, the research aims to find the minor structure that is already operational.The rationale behind this is that the smaller a structure is, the fewer parameters it has, the fewer hardware resources it requires, and the easier it is to deploy quickly.In addition to the above, two factors provided the basis for the design of the appropriate structure: 1. Previously, a DNN-based solution for binary fading classification has been applied [21].The basic logic of the procedure is the same as in the current research; the difference is mainly in how the labeling was done (what was labeled) and that the output and input signals have different frequencies.To solve this problem, a neutral mesh with long-short-term memory (LSTM) content and internal states was used, thus implementing a memory function [36].The results obtained show that the accuracy of the method is good but not equally acceptable for all categories.All this is due to imbalances in the data.
2. The process appears to be primarily moving average in terms of input.That is, some kind of averaging over a few time intervals of the measured value can theoretically give good results.The underlying argument, beyond the calculations, is quite simple.The value of the beacon signal is constant, affected only by changes in attenuation in the medium (with the error of the measurement system being negligibly small).So if there is a change in one frequency band, it will affect the others in some way.Typically behind this is the rain fade, for which there are already research results on how these events can be calculated from the received signal [37], [38].This, in turn, provides the opportunity to implement more efficient fading than the structures discussed in the current research.
In summary, after training about 350-400 nets to find the best hyperparameters, the following structure best meets the goals and expectations: normalize the data as shown in (6), and then apply a neutral net with the structure shown in Fig. 5: where  measured is the measured beacon signal value at sample time , in dBm,  normalised is the normalised input value, for the DNN,  is the mean value for the trainset and similarly  is the standard deviation for the same dataset.The first 15% of the total data length is the test set, the following 15% is the validation set, and finally, the remaining 70% is the train set.The parameters of the normalizer were calculated from the values of the train set.Although part of a data stream, the qualification used in the test set is neither a direct continuation nor a predecessor of the train set.
There are three significant logical steps (Fig. 5): representation learning, LSTM and single-label classification.
Representation learning, often referred to as feature learning, is a fundamental concept in machine learning.It involves automatically discovering and extracting meaningful patterns or features from raw data.These learned representations capture the underlying structure and characteristics of the data, enabling more effective and efficient data analysis, classification, and decision-making in various applications.Long Short-Term Memory is a type of recurrent neural network (RNN) architecture designed to handle sequential data (labeled with memory in Fig. 5).LSTMs are particularly effective at capturing and preserving long-range dependencies and temporal patterns in data.They achieve this by using a specialized gating mechanism that controls the flow of information through the network, allowing them to remember and forget information as needed selectively.It can remember both short and long-term relationships.
For a classification (see Fig. 5), every output is coded with one-hot code (the current single-label classification, but this is generally true) [39].The training used cross-entropy error (loss) functions (see Eq. ( 7)) [40], where the softmax function assigns a certainty to each element of the vector [41].The softmax output is the basis for the error calculation.The subsequent layer 9 assigns the element with the highest value of the vector, which thus becomes the selected one you get (hard decision): where   is the truth label (th element of the hot-encoded vector),   is the softmax probability for the corresponding class, and  is the number of classes.

Reduced Imbalanced Data
If we observe the frequency of occurrence of each Mod-Cod setting, we find that it is highly imbalanced.This causes training difficulties in many cases [42].The more rare categories are so rare that their incorrect prediction causes a small error.Second, they occur so infrequently, even almost simultaneously, for time series where it matters the order of arrival of data it is not feasible to get into each dataset.Thus, for the first tests, all classes below 1% were "eliminated" and assigned to the class "QPSK 2/3" by modifying the bounds.Because if the prevalence of a setting is so low, it is questionable whether it is really necessary for a given application.More precisely, we can consider as a limit the point in time (called a window) when a change does not bring any substantial improvement (or will be worse) than if we choose an option with worse efficiency.This exact limit depends on the tax and the receiver used.Still, it is also influenced by the specific geographical location [43], so the current limit of 1% is considered to be the individual value of this land-based receiver, based on our experience.Obviously, the procedure cannot predict cases that have not been taught, so there is no point in investigating them.

Weighted Softmax Loss Function
One possible solution to the imbalanced problem is to modify the cost function to be more attentive to the correct prediction of infrequent values [42]: where only one new member is introduced, namely the weight   associated with the element   .In the research, 5 different possible weights (labeled "A", "B", "C", "D", and none, which means the original softmax function was used) were tested, each based on the statistical distribution of the test data.The "A" weighting is: where   is the number of occurrences of the th category in the test data and  is the total number of test data.The same markings along the "B" weighting: where max(  ) is the sum of the most frequently occurring categories.The "C" weighting is: where  is the number of classes, and   is the th category sums.Last but not least, the "D" weighting is:

Results of the DNN
We applied all weights to the softmax loss function for the categories presented so far, including the case where we applied none.In the majority of cases, we used the Adam optimizer in the training [44].At the stop, the best-trained Tab. 4. The size was between 10 and 20, based on the partial correlation of the input.Of all the options, the one that achieved the best value was published.
DNN was always considered to be the one with the lowest validation loss.The results are shown in Tab. 4. A more precise description of the size is essential for interpretation.The structure tested was the same throughout, as it performed best based on preliminary results.At the same time, the number of elements in each layer of the DNN was varied proportionally, always using the size of the LSTM as the parameter of interest (seize parameter).In layers 2 and 4, the number of fully connected neurons was always the same, while in layer three, it was twice as high.
In order to interpret the results obtained, it is important to point out that the first two values, i.e., no weighting and option A, are essentially identical.So far, we have not been able to classify one as better than the other.The other procedures (weighted "B", "C", "D") have so not performed acceptably well compared to these two settings.The confusion matrix of the first network is shown in Fig. 6.The structure makes most mistakes when it predicts QPSK 2/3 instead of QPSK 3/5.
Compared to the initial distribution (see Tab. 3), the distribution of the first two categories is smaller than for the whole data set.The reason is that the period selected for testing contains fewer fadings in proportion.However, the discrepancy seen at this boundary suggests that the algorithm is having difficulty distinguishing between the beacon signals for the two cases.There are two plausible reasons for this: 1.There is a relatively small change in the beacon signal frequency, but at the data rate used it is significant enough to require a different ModCod setting.
2. Labeling is based on SNR estimation, which is based on the GNU Radio MPSK estimator.It is possible that the frequency band used in this algorithm is sufficiently inaccurate and thus causes errors.
Based on previous research in the field, the two frequencies used (the beacon and the DVB-S2 signals) are close enough to be considered equivalent [8], [12].It is conceivable that a structure with even more parameters might still be capable of improving upon all of these despite that.However, the most likely scenario is that the SNR estimator is not accurate enough, thus compromising the labeling process.

More Input
Another way to get more learnable parameters is to use more input.Using this will not bring too complex structures.We tested it in two ways: add one more delayed input and use two other signals, a moving average and a moving standard deviation.The last two parameters give two different hyperparameters that we can optimize.The tested versions did not cause significant improvement.
The confusion matrix of the best-performing neural network with three inputs is shown in Fig. 7.The additional two signal that used: a moving average with a window size of 200 (8000 ms) and standard deviation with a window size of 15 (60 ms).The idea behind these signs, on the one hand, is the past successful application for problems of this kind.On the other hand, scintillation can be significantly reduced by simple averaging [21].Compared to the previous results, there is no significant improvement or difference.

More Categories
It is impossible to teach additional ModCod categories based on our current data.There are not enough results to make it into all three datasets.After all, it is useless to teach something if can neither validate nor test it.If the category is not included in the training process, the net is not able to predict either.The solution to the problem is to obtain additional data, which is also a trivial solution for the unbalanced dataset.
At the same time, previous measurement results cannot be generated (no more SNR values are saved).To be able to test the solution, we modified the range of categories.We would like not only to decrease the size of the QPSK 2/3 category but we reduced every lower limit of each category by approximately 10%.With the artificial boundaries thus obtained, a sufficient amount of data is available; by definition, this already differs from the original situation.
The test results are similar to the best values so far.At the same time, estimating the newly created categories is not successful.Their average occurrence in the test case is less than 10-20 times, making the error insignificant.This approach is not suitable to solve the original task.

Usefulness for Data Transmission
The tests so far have all been about how the DNN performs in terms of training.Let us introduce a procedure based on the spectral efficiency, as this is primarily where we want the best utilization.This is done by simply summing the spectral efficiency assigned to each time slot with one condition: if we apply less robust solution than the ideal ModCod setting, than zero value is added; otherwise, the spectral efficiency of the predicted value is summed.For the "no signal" class, the spectral efficiency is 0. If we normalize this by the value of the ideal ModCod settings, we get how much the channel is used in terms of spectral efficiency.
The principle behind the method is simple.Given the expected error margin, we lost channel capacity when we don't utilize the highest available ModCod setting, however there is still operational throughput.The problem is not symmetric because if we choose a setting for which there is not enough fading margin to pass with the expected error, we have essentially wasted the timestep.Therefore, consider this to be zero from a transmission point of view.Likewise, no matter what we predict, if "no signal" is the current transmission ratio, we are likely to have no transmission.If we apply this to the best-performing procedure (see Fig. 6.) with four categories, we obtain a channel utilization of 92%.Doing the same for a dataset with the complete set of possible options (i.e., using seven possible categories) gives 93.5%.It is essential to highlight that the prediction hits none of the three additional categories introduced.Nevertheless, efficiency is improving slightly.It is conceivable that the "simplified" case could reach this value with further optimization of the hyperparameter.These numerical values are proportional to the accuracy and the error function but are not precisely the same.Future research plans include investigating how to successfully apply an asymmetric error function that follows this unique property of data transmission.At the same time, using the symmetric error criterion also yields an exceptionally good result.

Spectral Efficiency Comparison and Channel Utilization
The spectral efficiency presented in the previous section can be applied to other cases.The most crucial point to note is that the basis used is always the manually cleaned SNR values, which obviously differ from the measured one.
The ModOCd control based on the estimated SNR is hysteresis-based to avoid switching too often [8].Switching upwards requires at least 0.35 dB more than the theoretical minimum [8].However, for a more robust lower transmission setting, it is sufficient to exceed the limit by 0.05 dB [8].In addition, the efficiency of the procedure over the period tested is 62.85%.Using the unweighted DNN-based solution, this is 91.75%.
In addition, we performed a test where we cut out the half hour or so of the test data that was mostly jagged to see how the procedures worked during the "quiet" period.Of course, this period did not only contain "no signal" categories, and the rest of the time they still occurred.The rightmost column of Tab. 5 shows these values for the cases discussed so far.
Another important issue relates to signal propagation time.For a ModCod to take it effect, 514.5 ms are needed, considering the full uplink/downlink path through the satellite [8].This means that at a sampling density of 40 ms (let's call it a time step in this case), we can get the best setting, but we can only apply the result 12time time steps later.Of course, the channel may change during this time, and the optimal setting will be different.Although this is not what the current procedure is designed for, it acceptably solves the problem shown in Tab. 5.It is important to underline that, for the previous cases, the achievable channel utilization could theoretically be as high as 100%, but at this time, no.In fact, these the cases that do not address the timing constraints of the control, but the current ones do.

Case
The physical limitations also show why it is not worth learning the remaining three ModCod settings.The total number of occurrences does not significantly (or even once for the most sensitive) exceed the time required for a change.
In the actual case, when using timing, it is worth measuring the channel utilization to the maximum that can actually be achieved (the last two rows of Tab. 5).Here, we observe that the newer method improves this value by about 30%, up to 95%.
The results show that this method is better than simple methods that require SNR estimation based on hysteresis.The disadvantage of a DNN-based method is that it is complex and time-consuming to prepare for a specific channel, but this can be somewhat reduced with online learning-based solutions.However, seeing the results, the procedure can be more efficient, and there is no need for any intermediate step of quantity estimation, only the DNN calculations.It is particularly important to note that in the current training, the time constraints of controlling the real system were not taken into account, nevertheless the algorithm performs well.

Conclusion
Based on the results, the procedure is functional and proves two things.On the one hand, it is possible to carry out control based on a different (but close in frequency) band where the data transmission occurs.It may be the subject of future research, what are the limitations of this method, e.g.how big could be the frequency difference between the data and control channel.
However, the main subject of the current research is more exciting.Based on the calculations, the method presented in this research will increase the channel utilization by about 30%, reaching at least 95% utilization.With proper labeling and supervised machine learning, an ACM procedure can be applied to predict the settings; without intermediate quantity estimation or any decision procedure.In addition, the classification procedure can be easily adjusted to obtain more accurate results.
The limitation of the method is the labeling accuracy, like as usual with all classifications.It is expected that online learning-based solutions can be the ones that can help with this.Another possible future direction of development is taking into account of disturbing effects(rain fade, etc.) in the measured signal and performing labeling based on this.In addition to this, it is also worth pointing out that the best choice of the metrics used is also a matter for further research.
Parts of the models, data and codes that support the study are available from the corresponding author upon reasonable request.

Fig. 2 .
Fig. 2.The measured -for simplicity, scalar -signal is marked in green (arbitrary dimension).You can assign a category to it at any moment, which is, in fact, the ACM setting.We have marked the discretization of an arbitrary discretization with vertical dotted lines; this is, in fact, the discrete time of the example system.

Fig. 3 .
Fig. 3.The Alphasat satellite and the two ground stations.The signals used are sent from the Graz station to the satellite, which transmits them to the receiver in Budapest.Messages related to ACM are transmitted via Internet [8], [21].

Fig. 6 .
Fig. 6.Confusion matrix of the first reported net.The "small" matrices on the far left, or below the matrix, show the elements closer to the confusion matrix, which are usually the correct value (expressed as a percentage).The elements further away show the results of the corresponding incorrect estimates.

Fig. 7 .
Fig. 7. Confusion matrix of the best-performing neural network with 3 inputs.The "small" matrices on the far left, or below the matrix, show the elements closer to the confusion matrix, which are usually the correct value (expressed as a percentage).The elements further away show the results of the corresponding incorrect estimates.
The size of the fully connected layers was tested between 10 and 20, based on the partial correlation of the input.It has also been tested for other sizes, such as 50, 100.Of all the options, the one that achieved the best value was published.In the network shown in the table, the total number of parameters capable of learning is 2748.The tested version was implemented in MATLAB using elements of the Deep Learning Toolbox.
Channel utilization based on the choice of spectral efficiency.The method is the spectral efficiency used during correct transmission compared to the available one.The efficiency calculation is always based on the labeled data.