Influence of Duodenal – Jejunal Implantation on Glucose Dynamics : A Pilot Study Using Different Nonlinear Methods

1Technological Institute of Informatics, Universitat Politecnica de Valencia, Alcoi Campus, Plaza Ferrandiz y Carbonell, 2, 03801 Alcoi, Spain 2Department of Cybernetics, Czech Technical University in Prague, Prague, Czech Republic 3Centre for Biomedical Engineering, Department of Mechanical Engineering Sciences, Faculty of Engineering and Physical Sciences, University of Surrey, Guildford, UK 4Department of Electrical and Electronic Engineering, Imperial College, London, UK 5Department of Internal Medicine, Teaching Hospital of Mostoles, Madrid, Spain 6Department of Diabetes, Diabetes Centre, Institute for Clinical and Experimental Medicine, Prague, Czech Republic 7Department of Medical Biochemistry and Laboratory Diagnostics, General University Hospital, Charles University in Prague 1st Faculty of Medicine, Prague, Czech Republic 8Hepatogastroenterology Department, Transplant Centre, Institute for Clinical and Experimental Medicine, Prague, Czech Republic 9Obesitology Department, Institute of Endocrinology, and the Experimental Medicine Centre, Institute for Clinical and Experimental Medicine, Prague, Czech Republic


Introduction
Diabetes mellitus is a chronic serious disorder that affects nearly a half billion people in the world, and its prevalence is expected to grow significantly in the coming years [1].
It is also one of the greatest risk factors for cardiovascular diseases.In addition, the economical impact of the associated healthcare expenditure is enormous [2].
The global obesity epidemic is a relevant risk factor for diabetes, among many other diseases.As a result, weight 2 Complexity reduction methods have become a field of great medical and scientific interest.Successful long-term weight loss maintenance by strategies such as changing dietary habits, or increasing physical activity, is difficult [3].Other more aggressive approaches are necessary, especially when a significant weight reduction percentage is required, or more serious health consequences are involved.
One of these approaches is the utilization of pharmacotherapy, which in the context of diabetes can be focused on weight loss, or on glucose control, resulting in weight loss as a side effect [4].However, the long-term outcome is very limited [5].On the contrary, the most successful strategies are based on surgical therapies.The gastric bypass achieves permanent significant long-term weight reduction in most of the patients that underwent this surgery [6].
Nevertheless, the gastric bypass is a very invasive procedure, with pre-, peri-, and postoperative risks and possible sequelae [7].An intermediate solution is the socalled Duodenal-jejunal Bypass Liner (DJBL).The DJBL is a reversible implantable device used to mimic the effects of gastric by-pass by preventing the contact of chymus with duodenum and proximal jejunum and delivering it in a less digested form to more distal parts of the intestine.Obesity is a central physiopathological mechanism in type 2 diabetes mellitus (T2DM), and thus DJBL has been used to improve metabolic control in obese patients with T2DM [8].
The effects of DJBL implantation are broad and disparate.Many studies have assessed the impact of DJBL on a number of physiological parameters [9][10][11][12].We hypothesized that DJBL must have an influence on the glucose dynamics, as some studies about glucose variability and DJBL have pinpointed [13].These dynamics of the glucose time series offer new insights into glucose metabolism, and there seems to be a direct correlation between (the loss of) variability in the glucose time series and the degree of deterioration in glucose metabolism [14][15][16][17].
The glucoregulatory system is effectively a complex system, with several acting variables (caloric intake, exercise), a number of active hormones (insulin, glucagon, catecholamines, growth hormone, and incretins), and some well-established feedback and feedforward loops [18,19].The analysis of such a complex physiological system can be addressed using system dynamics characterization methods.Several methods well suited to time series of limited duration were used in this pilot study to characterize the effects of DJBL.Sample Entropy (SampEn) is a robust measure of regularity in sequences [20], whilst Lempel-Ziv complexity (LZC) is an easy to compute nonlinear algorithm to estimate the complexity in time series [21].Permutation Entropy (PE) is another complexity measure introduced by Bandt and Pompe in 2002 [22] as a robust method to deal with real-world time series.In spite of the proficiency of PE in time series analyses, it neglects equalities within signals.Modified PE (mPE) was proposed to address this shortcoming in the original PE algorithm [23].

Lempel-Ziv Complexity.
In order to compute LZC from a time series, the signal must first be converted into a sequence of symbols.In this study, the signal was parsed into a binary sequence using the median as the threshold (  ).For an input time series x = { 1 ,  2 , . . .,   } of length , the symbols in the binary sequence  = { 1 ,  2 , . . .,   } are created by The binary sequence  is then scanned from left to right to identify the different subsequences held within it and a complexity counter  is increased by one every time a new subsequence is found (a detailed description of the algorithm can be found in [21]).This complexity counter needs to be normalized to obtain a measure of complexity independent of the length of the time series [24]: LZC captures the dynamics of the time series by reflecting the rate of new subsequences present within it.

Permutation Entropy.
PE is a method measuring the entropy within a time series based on the probability of occurrence of all possible permutations of a certain length within it [22].With the exception of LZC, all other methods used in this study require the selection of values for different input parameters.In the case of PE, the computation relies on the selection of the embedding dimension  and the time delay .The choice of embedding dimension  is determined by the number of samples available, as the number of permutations must be less than the length of the time series (i.e., ! ≤ ) [22].
In order to compute PE as follows [23], embedding vectors need to be created from the original time series as follows: For each embedding vector, the lowest data point in the embedding vector is assigned a 0, the second lowest 1, and on until all data points in the embedding vector have been replaced with their ranking order.Once all possible embedding vectors in the time series have been created and ranked, PE can be calculated by applying Shannon's Entropy to quantify the proportion of possible permutations within the time series: where  is the number of different subsequence ranked vectors in the original time series and PA is the fraction of the subsequence ranked vectors.A less regular signal will have a greater range of embedding vectors and, therefore, a higher PE.
Complexity 3 2.2.1.Modified Permutation Entropy.One limitation of the original PE algorithm is that it ignores any repeated values in the embedding vector, assigning the first repeated value in the vector a lower integer in the ranking than subsequent repeats.This was addressed with the introduction of Modified Permutation Entropy (mPE) [23], in which repeated values are given the same ranking value.Then, entropy is evaluated applying Shannon's entropy as is done in PE [23].

Input Parameter Selection.
The outcome of PE will be influenced by the choice of embedding dimension  and delay .A greater value of  will give a greater possible range of ranking vectors and, therefore, a greater resolution.It has been recommended to use a range of values from  = 3 to 7 but the total number of permutations (given by !) must be less than the length of the original time series [22].However, small embedding dimensions might be too small to accurately track the dynamical changes in a signal [25].
Hence, PE and mPE were computed with  = 4 to 6.In terms of the time delay, a value of 1 was chosen as this would retain the original relationships between data-points [23].

SampEn.
SampEn was first defined in [20].SampEn starts by creating a set of embedded vectors x  of length : where  = 1, . . ., −+1.The distance between subsequences is then defined as According to a predefined threshold , two subsequences are considered similar if [  (),   ()] ≤ .This similarity is quantized for two consecutive subsequence lengths ( and  + 1), with   () number of  such that [  (),   ()] ≤ , and   () number of  such that [ +1 (),  +1 ()] ≤ .These two values  and  enable the computation of the statistics associated with SampEn: from which the final value for SampEn can be obtained: (for finite time series) The length  is usually given by the acquisition stage, but the parameters  and  must be carefully chosen to ensure an optimal performance of SampEn relative to class separability.This selection will be detailed in Section 2.3.1.

Input Parameter Selection.
The optimal selection of regularity estimators parameters  and  is still an open question.Frequent recommendations suggest adopting a parameter configuration in the vicinity of  = 2 and  = 0.2 [26].Nevertheless, this selection is lacking in terms of genericity, as many works have already demonstrated [27][28][29][30][31].
Although computationally more expensive, we varied SampEn parameters from 1 to 3 for  and from 0.01 to 0.30 for , in 0.01 steps ( was not multiplied by the standard deviation of the sequences since they were normalised in advance, zero mean, and unitary standard deviation).This enabled us to avoid the possible bias in the results due to the selection of a specific method for SampEn parameter optimization, despite still looking at regions usually recommended for  and  [20,26,32].

Experimental Dataset.
The database containing the experimental data of glucose time series was drawn from a database used to assess the endocrine effects of DJBL [13].There were 91 records from 30 participants with type II diabetes (20 men and 10 women, aged between 33 and 65).This database contained records taken before implantation (baseline, BL-, and 27 records), 1 month after implantation (01M+, 24 records), 10 months after implantation (10M+, 24 records), and 3 months after device removal (03M-, 16 records).Sampling frequency was 5 minutes.An example of records from each class in this database is shown in Figure 1.A very detailed description of the subjects can be found in the original paper [13].
As can be seen in Figure 1, the original records were noisy, with missing samples, and missing epochs completely at random.In order to avoid the influence of these artifacts on the results, missing single samples were linearly interpolated (mean substitution [33]).Records with less than 1440 samples (5 days) were excluded from the experiments, since the nonlinear methods used in the analysis are also very sensitive to number of samples [28].Record was then set to the central 1440 samples, if longer, to also avoid border effects [34] (tissue equilibrium, measuring device configuration, calibration, and stabilization).As a result of all this preprocessing, 60 records out of 91 were finally available.An example of resulting signals is shown in Figure 2.
Nevertheless, these records have not been analysed yet from a system dynamics standpoint, and our hypothesis was focused first on the two, in principle, most different scenarios: before DJBL implantation (BL-), and just before device removal (10M+).The rationale for this specific selection is that one month after implantation it will arguably be more difficult to find changes in glucose dynamics, due to the time passed.After DJBL removal, the glucose metabolism tends to return to that of the baseline period [11,13].Furthermore, quantitative endocrine effects seem to confirm that main differences are between these two stages, as shown in Table 1.Thus, in this table, 4 out of the 5 physiological features provide significant differences between 10M+ and BL-, giving Figure 2: Example of processed glycaemia records from the database.Single missing samples were linearly interpolated, longer missing epochs were removed, and shorter than 5 day-time series were discarded.Samples were taken from the central part of the records to avoid border effects.All these constraints are well featured in the changes that can be pinpointed with record 01M+ in this and previous Figure 1.
Records corresponding to baseline, 1 month, 10 months, and 3 months after DJBL removal, are depicted.Sampling frequency was 5 minutes.
quantitative support to the study selection.This support is in terms of a significance analysis of these differences obtained using Student's t-test ( = 0.05, sample size of 30, and normality not required [35]).As shown in the  column, only the differences in hip circumference could not be considered significant.The final experimental set was composed of 11 BL-records (positives P in the classification analysis) of 1440 samples and 11 10M+ records (negatives N) of the same length.These 22 records included the same 11 patients in both classes to ensure a paired test (7 males, 4 females), and the others in these classes were discarded.The dataset is relatively small, but the implantation of DJBL and glucose monitoring for  several days is very difficult and costly, in terms of workload, time, and resources.This table also includes a variability analysis result, the High Blood Glucose Index (HBGI).This index attempts to improve the assessment of glycaemic alterations through data transformation and is a well-established tool to estimate the risk of hyperglycaemia in diabetic patients.Average longterm blood glucose values are not a very reliable tool for glycemic control, but the analysis of short-term peaks and valleys has proven to have a much more clinical relevance [36].HBGI provides an estimation of the hyperglycaemia probability for the patients, and its differences have been found to be statistically significant for BL-and 10M+ in this case.

Class Separability Analysis. The separability of classes BL-and 10M+ was assessed by means of the Area under
Curve (AUC) of the associated Receiver Operating Curve (ROC), AUC-ROC.Statistical analyses based on paired Student's t-test or the Wilcoxon signed rank test, depending on the distribution of the data, were performed to assess the significance of the results.The acceptance threshold was set at  = 0.05.
Additionally, the classification capability of the results was quantified using the separability, specificity and accuracy classification performance indicators.They were obtained using the closest point to (0,1) in the ROC as the classification threshold.In this FRAMEWORK, positives (P) were assigned to the BL-class, negatives (N) to the 10M+ class, being sensitivity = TP / (TP+FN), specificity = TN / (TN+FP), and accuracy = (TN + TP) / (TN + TP + FN + FP ).

Results
All four methods showed a decrease of complexity between BL-and 10M+ (i.e.decrease of LZC, SampEn, PE, and mPE values).However, for LZC differences between the 2 classes were not significant (see Table 5).On the other hand, different combination of input parameters in SampEn, PE, and mPE resulted in significant differences between classes.
The results are expressed in terms of AUC, statistical significance, classification sensitivity, specificity, and accuracy.The threshold for classification was taken as the ROC point closest to point (0,1).These calculations were carried out for all the values in input parameters for which the AUC was at least 0.70.
Specifically for SampEn, all the AUC results are depicted as a heatmap in Figure 3.Most of the AUC values fall in the 0.50−0.60region, with more promising results at low  values ( < 0.10, optimal region), and in the 0.20 zone (suboptimal region).
In more detail, the numerical results for the highest AUC region are listed in Tables 2, 3, and 4.This corresponds to the optimal parameter zone, where some AUC values are above 0.80.
Tables 5, 6, 7, and 8 summarize the numerical results for the two classes, including mean and standard deviation.These values were computed using the parameter configuration scheme stated above.It is important to note that some configuration parameters did not yield significant results, such as  = 3 for PE (Table 6) and mPE (Table 7).As in previous similar studies [37][38][39], it seems the greater the embedded dimension , the better classification performance using PE-based measures.
In order to better illustrate the differences between the classes studied, a Leave-One-Out (LOO) test [37]  was applied using the data in Table 8.The classification threshold was set at the optimal SampEn value at which the classification accuracy was maximal.For both classes, there were 3 misclassified instances.Therefore, the overall classification accuracy using the LOO cross validation was 72.7%.As expected, the performance was lower than using all the records, but still very significant.This method was also applied to the Modified Permutation Entropy results in Table 7, when  = 6.In this case, there were 2 misclassified instances, achieving a classification accuracy of 82%.

Discussion
Our results show that it is possible to identify the effects of DJBL in the dynamics of glycaemia records with nonlinear analysis methods.A significant decrease in entropy (estimated with SampEn, PE, and mPE) of glycaemia records from BL-to 10M+ was observed.Complexity, quantified with LZC, also decreased in 10M+, but differences were not significant.
There is no gold standard for the unsupervised selection of parameters  and  for SampEn calculations, despite the numerous efforts in this regard [32,40,41].In order to leave no stone unturned, we adopted a maximalist strategy where a wide range of values were tested.As a result, this parameter analysis for SampEn provided an optimal combination with  = 3 and  = 0.09.In this case, the AUC was maximal, AUC= 0.8429, with significant (reject hypothesis) classification capabilities between BL-and M10+ (sensitivity=72.7%,specificity=90.9%,and accuracy=81.8%).However, there were other values for , with  in the vicinity of 0.09, that also yielded good significant accuracy.In fact, the results seem to be almost independent of .
The optimal value of  ( = 0.09), falls practically within the usually recommended interval,  ∈ [0.1, 0.25] [26].There is another region of acceptable results for  = 0.19.These specific values seem to be related to the resolution of the measurements, which was 0.1 mmol/L and the dissimilarity measure ( < 0.1, and  < 0.2).As for the  parameter, significant performance was achieved in all the cases tested.
As is also the case with SampEn, there is no consensus on the choice of input parameter values for the calculation of PE and mPE.However, some guidelines exist and were followed in this pilot study.Firstly, the delay was equal to 1 to guarantee that no downsampling of the original time series would occur.Secondly, the embedding dimension determining the size of the permutation vectors was selected taking into account its upper limit [22] and the reported results showing that small embedding dimensions could fail to identify changes in the dynamics of a signal [25].Therefore, a range of values from  = 4 to 6 was tested, with results showing that greater values of  lead to better discrimination between both classes.The highest accuracy was observed with  = 6 for both PE (77.27%) and mPE (86.36%), with the latter outperforming SampEn.It is worth noting that the entropy of glucose time series estimated with PE and mPE decreases from class BLto 10M+ for all subjects but two, but the subjects where this decrease is not observed are different for both methods.Furthermore, our results suggest that mPE is a more accurate method to characterize subtle differences in glucose time series than PE.
Despite the analysis limitations due to small database size, records length, and artifacts, the results confirmed there are differences between BL-and 10M+ records that can be associated with changes in the underlying glucose dynamics after DJBL implantation.With a high degree of accuracy (86.4%), it was possible to correctly distinguish between the The results are consistent.For most of the cases where AUC was relatively high (AUC> 0.75), the hypothesis was rejected and the classification accuracy was higher than 70%.The opposite also holds true when no significant difference was apparent.Namely, there is a good correlation among all the features used to assess the classification capabilities; there were no antagonistic results (AUC> 0.75 with  > ).

Conclusions
This paper explored the possible influence on the glucose dynamics of DJBL implantation.The study used several nonlinear methods.The best results were obtained with mPE calculated with an embedding dimension of 6 and with SampEn with input parameter values  = 3 and  = 0.09, although many other parameter configurations yielded suboptimal but relevant results.A similar approach was followed in other previous works related to blood glucose [42] or body temperature time series [43].
The performance of the method proposed could arguably be enhanced using other methods of theoretically better consistency [44].For instance, other modifications of PE can be considered, like fine-grained PE, based on incorporating the size of the differences between data-points into permutations and not just ranking them from smallest to largest [45], or the weighted PE, based on weighting permutation patterns depending on the amplitudes of their constituent data-points [46] Other effects should be studied, such as the influence of the artifacts, the characterization of the time-of-day variations (chronobiology), and the possible differences between other stages of the DJBL implantation.The availability of a validated set of methods for glucose dynamics assessment will arguably become a powerful tool for the study of disease onset and progression.
In summary, the DJBL implantation does alter the glucose metabolism of the subjects, and these changes can be detected by an analysis as the one proposed in this paper.This analysis may increase the clinical uses of the new information gathered.Additionally, there is room for improvement in terms of more accuracy and/or more classes.

Figure 1 :
Figure1: Example of raw glycaemia records from the database.Records corresponding to baseline, 1 month, 10 months, and 3 months after DJBL removal, are depicted.Sampling frequency was 5 minutes.Missing values were quite frequent (glucose readings spike down to 0 in the plots).Records like those shown here representing 03M-and BL-classes will be omitted in the experiments due to their short length.

Table 1 :
[13] characteristics and parameters of the complete database (Mean ± STD).Original results are taken from[13].

Table 2 :
Classification analysis results for PE in terms of highest AUC, including sensitivity (proportion of 10M+ correctly identified), specificity (proportion of BL-correctly identified), and accuracy (proportion of total cases correctly classified), and their significance .

Table 5 :
Lempel-Ziv Complexity individual, mean, and standard deviation (STD) values of the two classes studied, BL-and 10M+.

Table 6 :
Permutation Entropy individual, mean, and standard deviation (STD) values of the two classes studied, BL-and 10M+ for  = 3, 4, 5, 6.No significant differences between groups were found in some cases (Wilcoxon signed rank test).

Table 7 :
Modified Permutation Entropy individual, mean, and standard deviation (STD) values of the two classes studied, BL-and 10M+ for  = 3, 4, 5, 6.No significant differences between groups were found in some cases (Wilcoxon signed rank test).

Table 8 :
SampEn individual, mean, and standard deviation (STD) values of the two classes studied, BL-and 10M+.As far as we know, this is the first evidence in this classification context beyond variability and it opens a new perspective for the research of the DJBL implantation effects.