Influence of reconstruction settings in electrical impedance tomography on figures of merit and physiological parameters

Objective: Electrical impedance tomography (EIT) is a non-invasive and relatively cheap imaging technique allowing continuous monitoring of lung function at the bedside. However, image reconstruction and processing are not yet standardized for clinical use, limiting comparability and reproducibility between studies. In addition, optimal reconstruction settings still have to be identified for different clinical applications. In this work (i) a systematic way to select ‘good’ EIT algorithm parameters is developed and (ii) an evaluation of these parameters in terms of correct functional imaging and consistency is performed. Approach: First, 19 200 reconstruction models are generated by full factorial design of experiment in 5D space. Then, in order to quantify the quality of reconstruction, known conductivity changes are introduced and figures of merit (FoM) are calculated from the response image. These measures are further used to select a subset of reconstruction models, matching certain FoM thresholds, and are then used for in vivo evaluation. For this purpose, EIT images of one piglet are reconstructed to assess changes in tidal impedance and end-expiratory lung impedance, at positive end expiratory pressure of 0 and 15 cmH2O. From ground truth spirometry measurements, physiological criteria are formulated and the subset of models is further reduced. Finally, the remaining reconstruction models are evaluated on physiological data gathered from published data in the literature to assess the generalization possibilities. Main results: Parametrization of EIT image reconstruction has a strong influence on the resulting FoM and the derived physiological parameter. While numerous reconstruction models showed reasonable values for a single parameter, in total only 12 matched all simulation and physiological criteria. After validation on further physiological data, only a single reconstruction model remained with a noise figure of 0.3, target size of 0.08, weight radius of 0.3, normalized voltage and strong weighting of lung and heart regions. Furthermore, the relationship between the reconstruction settings and some FoM could be partly explained by using a linear statistical model. Significance: The quest for standard reconstruction settings is highly relevant for future clinical applications. Simulation measures might help to assess the quality of the reconstruction models, but further evaluation of more data and different experimental settings is required.


Introduction
Electrical impedance tomography (EIT) is gaining increasing importance for clinical applications by demonstrating its capability to provide meaningful physiological parameters (Gomez-Laberge et al 2012, Ukere et al 2016, Frerichs et al 2017. Ventilation parameters derived from EIT, e.g. tidal volume (Frerichs et al 1998, Balleza-Ordaz et al 2015, end-expiratory lung volume (Meier et al 2008), tidal recruitment (Muders et al 2012) or regional ventilation (Wrigge et al 2008, Elke et al 2013, have shown a high correlation with gold standard methods, while previous studies have shown an improved outcome for EIT-guided ventilation in acute respiratory distress syndrome (ARDS) (Wolf et al 2013, Eronia et al 2017. One of the biggest challenges in EIT is the uncertainty in the mathematical formulation of image reconstruction. Unlike computed tomography (CT), where electromagnetic waves are attenuated along a linear path, electrical currents used in EIT are diffusive and much more sensitive to changes near the electrodes. The mathematical formulation for image reconstruction is therefore highly 'ill-conditioned', requiring regularization in order to limit the solution space (Graham and Adler 2006). In practice, unknown domain boundaries (i.e. thorax and anatomical structures), thorax movement, electrode detachment and measurement noise add to the problem (Frerichs et al 2017). Since the choice of the reconstruction algorithm will determine the EIT image, the derived physiological parameters will vary accordingly.
Even though effort has been made to standardize EIT reconstruction and to quantify reconstruction quality (Adler et al 2009, Antink et al 2015, the selection of an optimal algorithm for a specific application is not yet fully understood. Further knowledge of the influence of reconstruction and the amount of expected variability is essential in order to standardize and provide comparability in EIT. The Graz consensus reconstruction algorithm for EIT (GREIT) (Adler et al 2009) is of particular interest here, since it optimizes image reconstruction against specific quality metrics, or FoM. Specifically, several small targets (usually with a diameter of 5% of the finite element model (FEM)) are placed in the forward model at different locations and from each target a response image is reconstructed. The resulting image is then evaluated against a desired image response, which should have low variation of amplitude response, position error and overshoot (or ringing) within a certain weighting radius (usually 25% of the FEM). For image response outside of the weighting radius around the placed target will penalize the solution and favor reconstruction settings that produce accurate solutions. Smaller weighting radii or larger target sizes will therefore increase the penalization, considering the amplitude response. However, in combination with other FoM, e.g. position error and ringing, and their interrelations, the resulting optim ized solutions are hard to predict. Thorough and systematic evaluation of the influence of these settings and the resulting FoM have not yet been performed, but might reveal important information on the internal function of GREIT and the relevance of these parameters for clinical analysis.
A detailed evaluation of image quality and contrast detection for EIT devices based on phantom measurements and FoM was performed by Gaggero et al (2015), while other efforts allowed the comparison between reconstruction models based on temporal features (Gagnon et al 2015). Previous studies have also investigated the influence of thorax shape error on the quality reconstruction models (Ferrario et al 2012 and the importance of reconstruction parameters with respect to ventilation (Grychtol et al 2014, Thürk et al 2017, as well as hemodynamic parameters , Thürk et al 2016. The latter showed, for instance, that an increase in the size of the targets in the GREIT algorithm produces smoother images compared to smaller targets. However, a systematic analysis of the influence of forward model characteristics in combination with parameters of the reconstruction algorithms, including GREIT settings, has not yet been performed with respect to in silico and in vivo data.
The present work aims are to quantify the performance of different GREIT reconstruction parameterizations including simulation as well as physiological measures.

Method
The analysis comprises three major steps (compare figure 1): (A) Develop a systematic method to identify 'good' EIT image reconstruction parameter values based on simulation quality measures (i.e. FoM). (B) Investigate whether these values also result in correct functional EIT images in experimental piglet data. (C) Evaluate the consistency of the resulting EIT images with a validation against data from the comparison framework proposed by Grychtol et al (2014).
In addition to the identification of 'good' parametrization of (A), (D) the specific influence of reconstruction settings on FoM was assessed with a linear statistical model.  Thürk et al (2017), data of one representative piglet with a 64 cm thorax circumference and 30 kg body weight were used for the present analysis. For this animal, a CT scan (Emotion 16, Siemens, Germany) at end-inspiration at positive end expiratory pressure (PEEP) 5 was acquired in order to obtain anatomical lung, heart and thorax contours. Then, a custom-made textile EIT belt (EIT Pioneer Set, Swisstom AG, Landquart, Switzerland) was placed along the 6th intercostal space (Waldmann et al 2017). EIT measurement settings were 3 mArms injection current at 195 kHz, skip 4 injection pattern (Luppi Silva et al 2016) with a sampling frequency of 48 Hz. In addition to the displayed values on the respirator (Elisa 800, SALVIA medical GmbH, Germany), airflow was measured with 1 kHz using a standard airflow transducer (SS11LB, BIOPAC Systems Inc, CA, US) and tidal volume (TV) was calculated by integration. Airflow and EIT signals were synchronized by feeding the analogue output of the EIT device into a channel of the BIOPAC system. The animal was mechanically ventilated in pressure-controlled mode with individual breathing rates (around 20-30 breaths min −1 ) to sustain normocapnia and the driving pressure was set so as to achieve a TV of 6 ml kg −1 . Two-minute measurements were performed at PEEP levels of 0, 5, 10 and 15 cmH 2 O after a 30 min adaptation period (after each step). To obtain robust measures, all derived parameters were averaged over all breathing cycles during these periods and the breathing rate was reduced to 6 breaths min −1 with a driving pressure of 6 cmH 2 O. Note that for the present work, only data from 0 and 15 cmH 2 O (with TV values of 248 and 301, respectively) were used, since this step was expected to yield the highest impact on the ventilation parameters.
To avoid the animal suffering, every effort was made according to the recommendations in the Guide for the Care and Use of Laboratory Animals of the National Institutes of Health (Kilkenny et al 2010).

Experimental Data set B-validation
Data set B was used to further validate the identified reconstruction models and is based on available data from the Functional Validation and Comparison Framework, proposed by Grychtol et al (2014). Reference can be made to this work for a detailed description of the methods and data permissions. Briefly, experimental data were retrieved from eight anesthetized pigs during mechanical ventilation, and the respiration settings were modified in order to evaluate the robustness of EIT in different scenarios. All animals were ventilated in volume-controlled mode with a constant TV and a respiratory rate of 20 breaths min −1 in order to maintain normocapnia. Over a period of 60 min, PEEP and the fraction of inspired gas (F I O 2 ) were changed from 0 to 5 cmH 2 O and from 21% to 100%, respectively. With increasing PEEP, the ventral-dorsal ventilation distribution is expected to shift towards the dependent lung regions (Neumann et al 1999), while an increase in F I O 2 should decrease ventilation in this region (Joyce et al 1993, Magnusson 2003. On the other hand, no change of TV, assessed by EIT, was expected at any point.

Forward and inverse models
In total, n = 19 200 different reconstruction models with different parametrizations for geometry (n = 6), tissue properties (n = 4), reconstruction criteria (n = 8 × 5 = 40), extent of regularization (n =10) and voltage reference method (n = 2) were created. The underlying reconstruction parameters are explained in the following.

2.2.1.1.Thorax shape
CT slices at the belt plane were manually segmented using ITK-SNAP (Yushkevich and Gerig 2017) and the anatomical boundaries of the thorax, heart and lungs were extracted. Then, contours were smoothed by removing Fourier descriptors, fd, in order to obtain six different shapes having an increasing amount of shape mismatch similar to Grychtol et al (2012). Specifically, after interpolating the original shape to 1000 spatial points, fd was cropped to 15, 5, 3, 1 and 0, whereas the latter corresponds to a circular contour (compare figure S1). The corresponding geometric error, ΔS, of these shapes was assessed by the symmetric difference, i.e. the nonoverlapping area. This indicates how much (in percentage) of the total area of the original shape is not covered by the smoothened shape. From these contours, 2.5D FEMs-by extruding a 2D model (Bahrani and Adler 2012)were created, which were further used for inverse model generation. For this purpose, 32 rectangular electrodes (size 0.5 × 2 cm) were placed equidistantly around the FEM.

2.2.1.3.Optimized EIT reconstruction
The GREIT reconstruction algorithm tunes reconstruction towards a set of requirements (or FoM). This is done by placing small conductive targets within the FEM and evaluating the corresponding response in the reconstructed image (see supplementary material for a detailed description (stacks.iop.org/PM/40/094003/ mmedia)). The size of these targets, ts, as well as the desired reconstructed radius, wr, are usually set to 5% and 25% of the body, respectively. These values represent the recommendations of the authors, but further experimental validation was not performed. We therefore chose to further evaluate different settings for these parameters with ts ranging from 0.01-0.08 (0.01 step size) and wr ranging from 0.1-0.3 (0.05 step size).
In addition, the extent of the regularization of the optimization formulation, i.e. the value of the hyperparameter, can be adjusted according to a noise performance measure. One common definition of this measure is the noise figure, nf, defined as the ratio of signal-to-noise ratios (SNRs) in the raw voltage and in the reconstructed image, i.e. nf = SNR(voltage)/SNR(image) (Adler et al 2009). GREIT allows the selection of a specific nf and automatically identifies the corresponding hyperparameter value (Graham and Adler 2006). In this work, nf ranged between 0.1 and 1 (0.1 step size).

2.2.1.4.Difference data
The difference data, d, used for image reconstruction is obtained by normalizing voltages against a reference time instant. Here, either time difference (TD), i.e. d = u − u r , or normalized time difference (NTD), i.e. d = u/u r − 1, is typically used.

Performance based on FoM
Approximately 1000 targets-one for each pixel position in the resulting image-were placed in the original FEM (i.e. with ΔS = 0) at the electrode plane (compare figure S4), and from the reconstructed targets of each of the 19 200 reconstruction models, the following FoM were calculated (see figure 1(A)): • Amplitude response (AR) is the ratio of pixel amplitude of the target conductivity to the value in the reconstructed image; • position error (PE) is the distance from the center of gravity of the original target and reconstructed intensity; • resolution (RES) measures the spatial extent of the reconstructed target relative to the size of the medium; • shape deformation (SD), describes the fraction of the reconstructed target, which does not fit the shape of the target (usually circular); and • ringing (RNG) measuring the image intensity of negative sign to assess overshoot effects.
Average and coefficient of variation over all of the 1000 targets were then calculated for each model, yielding, for example, AR and AR cv , respectively. The coefficient of variation, i.e. the ratio of standard deviation to the mean, was used, since distributions with higher mean will naturally have a higher standard deviation.
In order to distinguish favorable models from unfavorable models, a set of criteria was defined according to the reasoning established by Adler et al (2009) and Gaggero et al (2015). For AR, the average value AR describes the linear relationship between changes in conductivity to the reconstructed conductivity, i.e. corresponding simply to an offset in relative EIT. More importantly, the variation of AR throughout the reconstructed image, i.e. AR cv , should ideally be small in order to provide constant sensitivity. Considering PE, RNG, SD and RES, a small average as well as a small CV is desired. We consider the absolute value of PE (treating PE towards the center equivalent to an error towards the boundary). Since no numerical benchmark exists for the FoM, empirical values were derived from our data. Specifically, thresholds were defined based on quantile ranges, whereas with increasing importance of a FoM smaller quantiles (stricter criteria) were considered. For instance, for AR cv , the 50th percentile (or 0.41) and for RES, the 70th percentile (or 0.33) were chosen as thresholds (compare table 1).
In the first step, to identify the relationship between reconstruction parameters and FoM, Pearson correlation coefficients were calculated pairwise. Furthermore, the specific influence of the reconstruction settings on each of the FoM was expressed by using a linear model. Note that parameter values were stepwise discretized (e.g. nf to [1,2,…,10]) prior to the model fit in order to improve the interpretability of the results. In addition, to identify the parameters with the strongest influence on the FoM, a global sensitivity analysis was performed by calculation of the partial derivative, i.e. the local sensitivities in each point, and subsequent averaging over a fixed parameter: Here, s nf represents the global sensitivity for AR cv with K as the number of parameter values for nf (K nf = 10) and the index k. Note that we are interested in the largest contribution in change of a parameter. Therefore, the absolute values were considered for each derivative.

Performance based on physiological data
The second stage of our analysis is the evaluation against experimental data. All reconstruction models were evaluated based on physiological parameters calculated from Data set A. Specifically, TV, ventral volume(VV), and ΔEELV, were considered. TV was calculated as the sum over a tidal EIT image, VV as the sum over the ventral half of the tidal EIT image and ΔEELV as the sum of a difference EIT image between PEEP 15 and 0 cmH 2 O during expiration, i.e. the change of end-expiratory air content (compare figure 2). In order to define physiological criteria to quantify the performance of EIT reconstruction in regard to relevant physiological change, the change in TV and VV from PEEP 0 to 15 cmH 2 O and EELV relative to TV at PEEP 0 cmH 2 O was evaluated for each model. The calculated ratios were As a ground truth for rTV and rEELV, the same ratios were calculated from the spirometry volume measurements, yielding reference values of 82% for rTV and 160% for rEELV. A reasonable tolerance of ±5% and ±25% for rTV and rEELV, respectively, was defined for a quantitative classification of reconstruction models. Since spirometry only provides a global value, no ground truth was available for rVV. Here, a change of at least 65% from PEEP 0 to PEEP 15 cmH 2 O was considered to be reasonable, since it has been shown in ARDS patients that ventilation shifts towards dorsal regions with increasing PEEP (Neumann et al 1999, Meier et al 2008, Spadaro et al 2018. In Data set B, the ventral-dorsal ventilation distribution was assessed by the center of ventilation (CoV), analogous to the calculation of the center of gravity. The CoV was then normalized between −1 and 1, with the image center represented by 0, non-dependent regions by positive and dependent regions by negative values, respectively. Since for Data set B, no CT images were available. A model with averaged thorax, lung and heart contours had to be considered here .

Software
For the generation of FEMs, reconstruction models and subsequent image reconstruction, Netgen version 5.3 (Schöberl 1997) and EIDORS version 3.7.1 (Adler and Lionheart 2006) were used. Statistical analysis and signal processing were performed with proprietary MATLAB R2016b (The Mathworks, Inc., Natick, USA) scripts.

Simulation
Among the calculated FoM strong variations between different reconstruction models were observed. For instance, AR cv and PE ranged from 0.25 − 0.55 and 0.029 − 0.087 (90% confidence interval), respectively. See also figure 3(A), for distributions of all FoM. Strong negative correlations could be observed between wr and RES cv (r = −0.88) and ts and AR cv (r = −0.74), while positive correlation was highest between wr and RES (r = 0.67), nf and SD cv (r = 0.61), and ΔS and PE (r = 0.61).
For most FoM, the linear model could reasonably explain the influence of different reconstruction parametrization, e.g. for RES cv and AR cv with r² of 0.94 and 0.84, respectively. For SD and RNG cv , the linear model showed low r² with 0.31 and 0.44, respectively. The parameters with the highest influence were ΔS and w high , whereas w low showed only minor influence. Models with ΔS above 4% strongly increased PE, RNG and SD, while also decreasing PE cv . The extent of regularization, i.e. the value of nf, had a particularly strong effect on PE cv and RNG cv , whereas the former decreased and latter increased with increasing nf. With increasing regularization, RES seems to decrease, which is plausible considering that the area of the reconstructed intensity A q in the definition RES = » Aq A will be smaller with less smoothening. For all coefficients of the linear model, one can refer to table S1 and for the relationship between regularization and the FoM to figure S2. The global sensitivity analysis showed similar results for ΔS above 4%, having the strongest influence of most FoM. However, while nf did not show a significantly high linear relationship, s nf , was highest for RNG cv , RES cv and RNG. In addition, ts showed much higher values and produced the second highest sensitivity values for RNG cv , RES cv , SD cv and RNG (compare table S2).

Application of models on framework
The 12 models, fulfilling physiological and simulation criteria were further used to re-validate data from Grychtol et al (2014). Compared to previous findings, R A showed similar test results to the old best performing model R GR . Interestingly, most models failed to produce TV values independent of PEEP or a CoV that should be dependent on PEEP. For instance, R H performance was almost identical to R A , but was just below the significance threshold for the PEEP-dependent CoV. All reconstruction parametrizations were able to generate reproducible TV values for measurements at PEEP of 0 and 5 cmH 2 O. However, except for R A , the models could not generate images with reproducible CoV values at PEEP 0. Similar to previous results, the dependence of CoV on F I O 2 was not given in any model for the PEEP measurements.

Discussion
In this work, the performance of different reconstruction models, varying in their parameters of the forward and inverse solutions, were assessed in simulation and two experimental data sets. We show that the selection of these parameters is important, since wrong values might produce images that lead to physiological misinterpretation. At the same time, we seek to understand the set of parameter choices that yield good images from which reliable functional measures can be extracted. To investigate the relationship between reconstruction and image, a systematic methodology is proposed to describe the quality of the solution space, allowing the selection of 'good' parametrization. In detail, the evaluation included the (1) definition of thresholds for desirable FoM based on simulated targets; (2) comparison of physiological parameters to the ground truth of spirometry; and (3) evaluation of reconstruction against the functional validation and comparison framework presented in Grychtol et al (2014).
Both, FoM and physiological parameters seem to be highly dependent on reconstruction settings as different parametrizations could produce vastly different values, (compare figure 3). Considering physiological measures separately, it seems as though there are actually quite a lot of reconstruction settings that produce reasonable clinical values. However, from 19 200 initial parametrizations, in total only 12 could match both simulation and physiology-based criteria, suggesting that the parametrization might need to be tuned for specific applications, as also suggested by Antink et al (2015). In fact, only a single model (R A ) displayed a similar performance to the best model (R GR ) within Data set B of the comparison framework. This model was parametrized with nf of 0.3, ts of 0.08, wr of 0.3, NTD and a high weighting factor of lung and heart tissue. Interestingly, even though using TD resulted in strong boundary artefacts, ten out of the best 12 reconstruction models had this voltage reference method configured. These artefacts, however, seem to distort the total impedance change throughout the image, reducing their ability to correctly identify values of TV and CoV after changes in PEEP. Interestingly, however, most of the reconstruction algorithms used in the previous work of Grychtol et al could not assess these changes. To fully understand the underlying changes of TV and CoV during these states, future work should re-perform this evaluation using an imaging reference method such as CT.
Furthermore, it should be noted that even though the physiological significance was similar for the best models, the visual structure in the resulting images still shows large heterogeneity, (compare figure 4). This point is fairly significant, since clinicians have certain expectations towards tomographic images in terms of consistency and similarity between patients. Visual differences among EIT images which, nevertheless, produce meaningful physiological measures could create confusion. This can be seen, for instance, in figure 4 for R A , R B and R GR . While R B and R GR look very similar, only the latter could produce the same CoV value for two measurements with the same ventilation settings. On the other hand, even though impedance changes in R A look completely different, the CoV was robustly assessed. The fact that the images looked quite different could also come from the arbitrary threshold selection of the FoM. Considering that small deviations of a best model's parameters (e.g. increasing the noise figure by 0.1, compare figures S2 and S3) should not strongly influence the resulting image, we believe that the chosen thresholds (and the inclusion of all FoM) might have been too strict and that from several reasonable image clusters we only detected some, and within only the very best images. An exhaustive Table 2. Reconstruction models matching both FoM and physiology-based criteria with corresponding reconstruction settings, noise figure, nf, target size, ts, weighting radius, wr, reference method, ref (TD and NTD), and weighting of lungs and heart, w. analysis of varying threshold was unfortunately beyond the scope of this research, but further investigation using other combinations of thresholds might be interesting.
Concerning the generation of the FEM, it should be noted that the thorax diameter changes with increasing PEEP levels. The individualized model obtained at PEEP 5 therefore already had slight shape errors of 0.5% and 0.9% against PEEP 0 and PEEP 15, respectively. Even though we set the threshold for shape inaccuracy at 4% according to Grychtol et al (2012), this initial error might have influenced our analysis. Here, we also have to acknowledge that the averaged model used for Data set B will induce unknown errors. In addition, the animals did not have the same age and size in both data sets, which might have induced errors that we could not account for, due to the unknown internal anatomical structure in Data set B.
For the first time, the influence of reconstruction settings on simulation-based FoM was assessed, providing a basis for further evaluations. While it is interesting to see how FoM change with different reconstruction settings, we think that another relevant investigation in this regard would be the analysis of the relationship between FoM and physiological parameters. Thereby, simulation could provide information about the expected reconstruction quality of a reconstruction algorithm, in terms of physiological accuracy. However, further evaluation is necessary to establish this bridge, and should possibly also include a larger data set to evaluate quality (Antink et al 2015), which was not the scope of the present work. It should also be added here, that the proposed linear model will not be able to describe non-linear behavior and the interrelation between different reconstruction parameters.
Since no suggestions for FoM thresholds exist, simulation criteria were arbitrarily defined based on their importance and the distribution of our data. It is important to highlight that these criteria do not provide any objective measure for the quality of reconstruction algorithms yet, but have to be further investigated with additional measurements. Even though the physiological criteria were based on allowing spirometry data to set reasonable thresholds, no reference was available for regional ventilation distribution. Even though a decrease of ventral ventilation has been reported in several previous works, the chosen threshold was arbitrarily selected and might have excluded or included too many configurations. It should also be mentioned that the full factorial design of experiment in this work required a vast amount of computation with a total calculation time of several weeks on multiple personal computers. While the availability of all possible combinations of reconstruction settings was necessary to describe their specific influence on FoM and physiological measures, future work could implement more sophisticated space sampling (i.e. Monte-Carlo simulation or latin hypercube sampling).
GREIT is one of the most used reconstruction algorithms for EIT images (Kobylianskii et al 2016) and this work established further proof of its ability to assess not only changes in lung physiology, but the importance of selecting reasonable reconstruction settings. Our goal is to provide a broader understanding of the 'ill-conditioned' nature of EIT reconstruction and the corresponding implications for the interpretability of physiological results. As shown, not only small changes in the measurement, but small changes in the parametrization of the reconstruction algorithm can generate strong changes in the calculated image. For further validation of the presented approach, more experimental and clinical data will be highly valuable.
Commercial devices have to be considered as black boxes with unknown reconstruction settings, and while providing a solid foundation for clinical studies, the reproducibility or even improvement of images and derived results is often difficult. Since a transparent reporting of the applied reconstruction could help to identify optimal reconstruction settings for different purposes, we believe that future work should focus on establishing easy-to-use frameworks to allow researchers thorough validation of their data based on various different reconstruction algorithms and settings.

Conclusion
Reconstructed images and derived parameters in EIT strongly depend on the parametrization of the specific reconstruction algorithm. Various combinations for the reconstruction exist, which produce reasonable values for different clinically relevant parameters such as TV or EELV, but specialized optimization for different applications might be necessary. The identification of 'good' reconstruction models might be generalizable for different devices and measurement patterns.