A Comparison of Personalized and Generalized LSTM Neural Networks for Deriving VCG from 12-Lead ECG

: Vectorcardiography (VCG) is a valuable diagnostic tool that complements the standard 12-lead ECG by offering additional spatiotemporal information to clinicians. However, due to the need for additional measurement hardware and too many electrodes in a clinical scenario if performed along with a standard 12-lead, there is a need to ﬁnd methods to derive the VCG from the ECG. We have evaluated the use of Long Short-term Memory (LSTM) neural networks to learn the transformation from 12-lead ECG to VCG that is applicable across subjects and for each subject. We refer to these networks as generalized and personalized, respectively. We calculated the Root Mean Square Error (RMSE), R 2 , and Pearson correlation coefﬁcient to compare waveforms of derived and actual VCG. We also extracted and compared diagnostic parameters from VCG, namely the QRS-loop magnitude, T-loop magnitude, and QRS-T spatial angle, from actual and derived VCGs using the Pearson correlation coefﬁcient and Bland Altman limits of agreement. The personalized models performed better than generalized models in waveform comparisons and in the error of extracted diagnostic parameters from VCG waveforms. The use of personalized transformations for the derivation of VCG from standard 12-lead has the potential to improve and augment the diagnostic yield and accuracy of a standard 12-lead interpretation.


Background
Clinical ECG consists of 12 leads (S12)-namely limb leads I, II, and III, augmented leads aVR, aVL, aVF, and precordial leads V1 through V6. Vectorcardiography (VCG) [1] is complementary to the S12. It is essentially the spatiotemporal representation of the cardiac vector in 3 orthogonal planes-namely vertical, transverse, and sagittal planes. S12 is the standard whereas VCG is rarely acquired. However, several conditions have more prominent VCG changes than S12, so it is a useful complement to S12. Furthermore, dynamic spatial and temporal information that can be derived from VCGs is unavailable from an ECG, which may enhance the automatic assessments of cardiovascular diseases [2].
Ernest Frank introduced the XYZ lead system known as vectorcardiography to provide a 3-dimensional representation of the cardiac vector. Figure 1 shows the electrode placement for vectorcardiogram and an example of a vectorcardiogram tracing for a healthy male subject.
Following the illustration in Figure 1, a vector tracing the boundary of this 3D object circumscribed by the vectorcardiography is the cardiac vector. Ideally, these ECG leads-X, Y, and Z-would be orthogonal to each other and form a basis for the cartesian space spanned by the cardiac vector. Figure 2 illustrates how the temporal X, Y, and Z lead waveforms translate to a spatiotemporal VCG. resistor network needed to compensate for non-homogenous human tissue-additional instrumentation beyond standard 12-lead ECG equipment [1] (d) 3-D illustration of a single heartbeat from a 58-year-old healthy male [3,4].  [3,4].
The S12 requires ten electrodes on the skin while the Frank XYZ requires only 7 electrodes. There is only one electrode position in common (i.e., the left leg). Suppose all 15 leads are to be recorded with an ECG acquisition system, then sixteen electrodes should be placed on the patient's skin. Another practical issue with the location of the Frank XYZ electrodes is the rear electrodes. Patients can sleep on their backs, but having cables on their backs can be uncomfortable.
resistor network needed to compensate for non-homogenous human tissue-additional instrumentation beyond standard 12-lead ECG equipment [1] (d) 3-D illustration of a single heartbeat from a 58-yearold healthy male [3,4].
The S12 requires ten electrodes on the skin while the Frank XYZ requires only 7 electrodes. There is only one electrode position in common (i.e., the left leg). Suppose all 15 leads are to be recorded with an ECG acquisition system, then sixteen electrodes should be placed on the patient's skin. Another practical issue with the location of the Frank XYZ electrodes is the rear electrodes. Patients can sleep on their backs, but having cables on their backs can be uncomfortable. The S12 requires ten electrodes on the skin while the Frank XYZ requires only 7 electrodes. There is only one electrode position in common (i.e., the left leg). Suppose all 15 leads are to be recorded with an ECG acquisition system, then sixteen electrodes should be placed on the patient's skin. Another practical issue with the location of the Frank XYZ electrodes is the rear electrodes. Patients can sleep on their backs, but having cables on their backs can be uncomfortable.

Diagnostic Importance of VCG and Its Complementarity to ECG
Over several decades of research, three parameters extracted from the VCG waveform are considered diagnostically important. They are the QRS amplitude, T-loop magnitudes, and Spatial QRS-T loop angles. Figure 3 illustrates these parameters on a vectorcardiogram.

Diagnostic Importance of VCG and Its Complementarity to ECG
Over several decades of research, three parameters extracted from the VCG waveform are considered diagnostically important. They are the QRS amplitude, T-loop magnitudes, and Spatial QRS-T loop angles. Figure 3 illustrates these parameters on a vectorcardiogram. Figure 3. Illustration of the parameters that are extracted from a VCG peak QRS magnitude, peak T wave magnitude, and spatial QRS to T angle [3,4]. Table 1 lists the clinical applications of these parameters that have been validated in the literature. The spatial QRS-T angle parameter has been shown to be useful for risk stratification for cardiac events, evaluation of incident coronary disease and heart failure, and efficacy of therapy for adult hypertension and diabetes mellitus [5]. For example, in the PTB diagnostic database [3,4] ECG used in this study, the mean and standard deviation of the Spatial QRS-T angles from patients with MI and healthy controls were 87.9° ± 46.84° and 52.95° ± 35.76°, respectively, as computed using the VCG parameter extraction algorithms described in this study.  Figure 3. Illustration of the parameters that are extracted from a VCG peak QRS magnitude, peak T wave magnitude, and spatial QRS to T angle [3,4]. Table 1 lists the clinical applications of these parameters that have been validated in the literature. The spatial QRS-T angle parameter has been shown to be useful for risk stratification for cardiac events, evaluation of incident coronary disease and heart failure, and efficacy of therapy for adult hypertension and diabetes mellitus [5]. For example, in the PTB diagnostic database [3,4] ECG used in this study, the mean and standard deviation of the Spatial QRS-T angles from patients with MI and healthy controls were 87.9 • ± 46.84 • and 52.95 • ± 35.76 • , respectively, as computed using the VCG parameter extraction algorithms described in this study. Additionally, there are specific conditions where the VCG is considered superior to the ECG. VCG is more sensitive and specific than ECG in detecting atrial and ventricular enlargements. Due to the greater spatially localized information in a VCG, the suspicion of electrically inactive areas in the septal or anteroseptal walls of the left ventricle can be assessed with a VCG. The left ventricular mass, which is currently assessed with an echocardiogram, can be assessed with a VCG. VCG findings are better correlated to echocardiography findings than ECG findings. VCG has a greater diagnostic sensitivity than ECG for AMI when associated with a left anterior fascicular block [6]. Lastly, the myocardial damage caused by Chagas disease can be assessed with VCG findings complementing ECG findings [7]. Figure 4 illustrates the conditions and the location of the affected heart anatomy.
Additionally, there are specific conditions where the VCG is considered superior to the ECG. VCG is more sensitive and specific than ECG in detecting atrial and ventricular enlargements. Due to the greater spatially localized information in a VCG, the suspicion of electrically inactive areas in the septal or anteroseptal walls of the left ventricle can be assessed with a VCG. The left ventricular mass, which is currently assessed with an echocardiogram, can be assessed with a VCG. VCG findings are better correlated to echocardiography findings than ECG findings. VCG has a greater diagnostic sensitivity than ECG for AMI when associated with a left anterior fascicular block [6]. Lastly, the myocardial damage caused by Chagas disease can be assessed with VCG findings complementing ECG findings [7]. Figure 4 illustrates the conditions and the location of the affected heart anatomy. Since VCGs are not acquired during regular clinical settings, but standard 12-lead waveforms are acquired, there is a need to derive VCG from the 12-lead ECG. Specifically, an arbitrarily complex transformation mapping the 12-lead ECG to the VCG is needed. Several research efforts are focused on arriving at the linear transformation of ECGs from standard 12-lead to Frank XYZ. However, the transformation is likely to be arbitrarily complex due to multiple underlying variabilities from person to person in terms of the distribution of fat, muscle, and organs in the torso where the ECG leads are measured. These complex variations suggest that we need methods capable of approximating arbitrarily complex transformations, such as neural networks [8,9]. Therefore, we used a class of neural networks, namely Long Short-term Memory (LSTM) [10,11], that might be best suited for time-series data regression tasks, such as transforming leads. Moreover, most recently, Sohn et al. [12] reported the successful use of LSTM networks to achieve accurate lead transformations. The following are the original contributions of this research -

•
We apply LSTM networks to the task of deriving VCG from 12-lead ECGs. Since LSTM networks require the pre-specification of several hyperparameters, we apply Bayesian global optimizations to find the combination of these parameters that is optimal to obtain the least error between derived VCG and actual VCG; • We apply transfer learning to obtain personalized transformations for each subject as part of the data set; • We compare the accuracy of extraction of VCG diagnostic parameters from derived VCG and actual VCG.  Since VCGs are not acquired during regular clinical settings, but standard 12-lead waveforms are acquired, there is a need to derive VCG from the 12-lead ECG. Specifically, an arbitrarily complex transformation mapping the 12-lead ECG to the VCG is needed. Several research efforts are focused on arriving at the linear transformation of ECGs from standard 12-lead to Frank XYZ. However, the transformation is likely to be arbitrarily complex due to multiple underlying variabilities from person to person in terms of the distribution of fat, muscle, and organs in the torso where the ECG leads are measured. These complex variations suggest that we need methods capable of approximating arbitrarily complex transformations, such as neural networks [8,9]. Therefore, we used a class of neural networks, namely Long Short-term Memory (LSTM) [10,11], that might be best suited for time-series data regression tasks, such as transforming leads. Moreover, most recently, Sohn et al. [12] reported the successful use of LSTM networks to achieve accurate lead transformations. The following are the original contributions of this research:

•
We apply LSTM networks to the task of deriving VCG from 12-lead ECGs. Since LSTM networks require the pre-specification of several hyperparameters, we apply Bayesian global optimizations to find the combination of these parameters that is optimal to obtain the least error between derived VCG and actual VCG; • We apply transfer learning to obtain personalized transformations for each subject as part of the data set; • We compare the accuracy of extraction of VCG diagnostic parameters from derived VCG and actual VCG.

Related Work
Linear regression has been explored in the literature for lead transformation. Some studies have used open, publicly available data sets, whereas others have used closed data sets or data sets acquired with custom built hardware devices. Between 1986 and 2009, the lead transformation of interest was from S12 to Frank XYZ. Closed data sets were used for some studies [13][14][15][16] and open for others [17,18]. A neural network-based transformation was first proposed in 2010 [19]. Table 2 summarizes the works in the literature that focused on obtaining Frank XYZ from S12. Since then, several efforts have been made in reducing leads required to be monitored while retaining the diagnostic power of S12. Most of the studies have tried to derive S12 from a three-lead ECG [12,[19][20][21][22][23][24]. Table 2. List of related works that evaluated lead transformations from S12 to Frank XYZ. (N is the number of samples per ECG channel, y is the actual acquired ECG, andŷ is the output ECG from the transformation).

Publication Data Availability/Transformation Method Reported Performance Metrics
Bjerle P et al., 1986 [11] closed/Linear regression Amplitudes of ECG waves QRS, ST and T Edenbrandt L et al., 1988 [12] Amplitude of R wave Hyttinen J et al., 1995 [14] Pearson Guillem MS et al., 2006 [15] open/Linear Regression This work PTB diagnostic ECG [3,4]. Open/LSTM RMS error; Correlation coefficient, R2, QRS magnitude, T magnitude, and Spatial QRS-T angle Several studies have used closed data sets that are unavailable to other researchers. We used the PTB diagnostic ECG repository for this study [3,4].
The Root Mean Square (RMS) and correlation coefficient are the most commonly reported metrics used to evaluate the error between generated or derived ECG compared to the ground truth waveform. In the literature, R-squared is used. Table 2 includes the definitions and equations of these metrics. Some clinically relevant VCG-derived parameters can also be compared between the derived ECG leads and the ground truth waveform. Therefore, the RMS error, correlation coefficient, R 2 , QRS amplitude or magnitude, T wave amplitude or magnitude, and spatial QRS-T angle form a complete assessment.
The coefficients of the transformations reported in the literature are presented in Table S1 in the Supplementary Materials.

Experimental Setup
We had previously presented the methodology of training a generalized model and then applying transfer learning for a different problem, which was for the S12 lead ECG derivation from a subset of leads, namely Lead II, V2, and V6 [25]. However, in this paper, we evaluate the performance of the transformations from S12 lead to Frank XYZ lead. We present the methodology here for the convenience of the reader. All data analysis programs and applications were implemented using MATLAB 2021a Update 5 version 9.10.0.1739362 (MathWorks Inc., Natick, MA, USA) on a system with an Intel processor (i7-7820X), RTX 3090 graphics processing unit (NVIDIA Corp., Santa Clara, CA, USA), and 32 GB of RAM.

Source of Data and Data Preparation
The PTB ECG database available on Physionet [3,4] contains fifteen lead ECGs sampled at 1 kHz from 249 patients. Some patients have multiple records, bringing the total number of ECGs to 549. Figure 5 plots the histograms that summarize the data set's characteristics.
This data set contains only one diagnosis per patient. As shown in Figure 5, a large proportion of the data set is MI patients and healthy controls.
The ECG signals were band pass filtered using a second order Butterworth filter with a passband from 0.05 Hz to 45 Hz, which is the bandwidth used for long-term rhythm monitoring according to AAMI standards [26]. Following filtering and suppression of frequencies beyond 40 Hz, the signal was down sampled using decimation to avoid aliasing effects. First, ECG signal content in adults was below 100 Hz [26], so 200 Hz satisfied the Nyquist rate requirements to avoid aliasing. Second, lower sampling rates reduced the amount of data so iterations could be faster. Figure 6 shows an example of a recording from the data set before and after applying the above-stated data preparation steps. The data processing steps cause no visible distortion.
Eng 2023, 4,6 number of ECGs to 549. Figure 5 plots the histograms that summarize the data set's characteristics. This data set contains only one diagnosis per patient. As shown in Figure 5, a large proportion of the data set is MI patients and healthy controls.
The ECG signals were band pass filtered using a second order Butterworth filter with a passband from 0.05 Hz to 45 Hz, which is the bandwidth used for long-term rhythm monitoring according to AAMI standards [26]. Following filtering and suppression of frequencies beyond 40 Hz, the signal was down sampled using decimation to avoid aliasing effects. First, ECG signal content in adults was below 100 Hz [26], so 200 Hz satisfied the Nyquist rate requirements to avoid aliasing. Second, lower sampling rates reduced the amount of data so iterations could be faster. Figure 6 shows an example of a recording from the data set before and after applying the above-stated data preparation steps. The data processing steps cause no visible distortion. Three recordings were removed from the data set because of missing data or complete data corruption by noise. Table 3 lists them and the reason for exclusion. Table 3. List of recordings that were excluded due to low signal quality or no signal.

Rejected Recording
Reason for Exclusion Three recordings were removed from the data set because of missing data or complete data corruption by noise. Table 3 lists them and the reason for exclusion. Table 3. List of recordings that were excluded due to low signal quality or no signal.

Rejected Recording Reason for Exclusion
Record 291 from patient 095 V1 lead missing Record 537 from patient 285 No ECG data Record 453 from patient 220 Lead III data missing

Personalized Training Data Preparation
Data augmentation was performed using the sliding window method as used in [12]. Each sliding window was 17 s, and the overlap size was 16 s. We chose a window size of 17 s solely for formatting and initial review purposes for the S12 leads. We required 12 s of data to chart S12 in a standard clinical ECG format. We also had symmetrically cropped 2.5 s of data on both ends of each 17-s-long segment so that we could be consistent across all segments. The beginning and end of several records had settling noise, such as baseline wander or powerline noise, so we removed these segments during data preparation. The 16 s overall was chosen to maximize overlap and number of training samples available following similar approaches in the literature that showed good performance using LSTM.

Neural Network Architecture
As mentioned earlier in Section 1.2, we used LSTM neural networks to learn a transfer function from S12 to Frank XYZ. The principal constituents of the LSTM network were the input gate (i), forget gate ( f ), and output gate (o). In addition, each LSTM network consisted of a cell state that was updated upon each timestep of input presented to the network.
The LSTM cell is a type of recurrent neural network. The output at time t − 1 influences the output at time t. Figure 7 presents a depiction of a single LSTM cell. The training process for the LSTM neural network involved a standard four-step sequence: forward propagation, cost computation, backward propagation, and weight update. This process was repeated for each item in the training set multiple times. The loss function was the mean squared error without normalization of the number of output dimensions or the number of ECG channels. Weight updates were performed using Adam optimizer [27].
where was sequence length of each ECG channel, equaled the number of ECG channels in the output, � was the instantaneous estimated output, and was the instantaneous actual sample of ECG.
The LSTM network required selection of the following list of hyperparameters and X + σ σ σ X tanh X tanh Figure 7. Depiction of the computation occurring in a unit LSTM cell.
The training process for the LSTM neural network involved a standard four-step sequence: forward propagation, cost computation, backward propagation, and weight update. This process was repeated for each item in the training set multiple times. The loss function was the mean squared error without normalization of the number of output dimensions or the number of ECG channels. Weight updates were performed using Adam optimizer [27]. where S was sequence length of each ECG channel, R equaled the number of ECG channels in the output,ŷ was the instantaneous estimated output, and y was the instantaneous actual sample of ECG. The LSTM network required selection of the following list of hyperparameters and architecture specifications prior to training-number of layers, number of hidden units per layer, learning rate, minibatch size, learning rate schedule (i.e., periodic changes as training progresses or fixed with no changes), and finally, the weight optimizer parametersmomentum coefficient (β 1 ) and root mean square (RMS) propagation coefficient (β 2 ).
The LSTM network architecture requires the specification of a list of hyperparameters. Bayesian optimization (BO) is a global optimization approach that is preferred in the literature for computationally intensive functions like the training of neural network [28].

Network Training Options
The 546 available records were split 80/20 between training and testing. The training set had 437 records, and the test had 109. Network training was performed over 100 epochs for all networks, including the personalized networks.
BO did not include number of layers as an optimizable variable, so we were able to infer the meaning of the number of layers in a controlled way rather than as part of a probabilistic search like BO. Therefore, independent hyperparameter tuning was performed for 1-through 5-layer networks and the results were compared across the number of layers to understand the impact of multiple LSTM layers on performance.
Hyperparameter Optimization Using BO BO was used to find the optimal combination of values for the hyperparameters needed for the LSTM networks. The application method of BO included three key elements:

1.
A Gaussian Process Model ( Q( f |x, y) )-final validation RMSE was the objective function f (x). The kernel function for the model was ARD Matérn 5/2; 2.
A procedure for updating ( Q( f |x, y) ) after each iteration; 3.
An acquisition function a(x) that was 'expected improvement' [29].
where µ Q x Optimal was the minimum of the posterior mean and x Optimal was the location of this minimum in hyperparameter space. To boost the inclination for sampling x and prevent over-sampling of a region within the hyperparameter space around a local minimum of x, another criterion was added in addition to the one used to select a(x). This condition was implemented as an additional restriction when choosing the subsequent x for evaluation. A candidate x had to satisfy the criteria in (3) to be selected as the subsequent point to be evaluated.
where σ f (x) represented the standard deviation of the posterior objective function at x and σ the additive noise's posterior standard deviation. The optimizable variables or the hyperparameters had to be defined in terms of bounds and the type of transformation to be applied prior to sampling. Table S2 in the Supplementary Materials lists the hyperparameters optimized for networks ranging from 1-layer to 5-layer. For each objective function evaluation, networks were trained for 100 epochs to allow adequate iterations to reach the lowest final RMSE.

Training Personalized Networks
Transfer learning is the process of further training a pre-trained neural network using a different data set or subset of data [30]. We trained a personalized neural network for each patient using transfer learning with the optimal network architecture and hyperpa-Eng 2023, 4 1345 rameter combinations found by BO. The data set had 549 ECG recordings from 290 patients, averaging 200 s per recording, with a few patients having only 100 s of data.
As described in Section 2.1.2, network training was performed over 100 epochs with personalized data.

Evaluation of Extracted VCG Parameters
As described previously in Section 1.1, three VCG extracted parameters-peak QRS magnitude, peak T wave magnitude, and spatial QRS-T angle-were of diagnostic importance. These parameters were computed from the actual Frank XYZ leads and the derived Frank XYZ leads from all the transformations. The algorithm for calculating these parameters began with the detection of the R wave of the ECG in the Vx lead. The QRS duration and the R wave durations were defined relative to the corresponding RR interval, as depicted in Figure 8.  The peak magnitudes of QRS were calculated as the maximum of the L2 norm of (Vx, Vy, Vz) within the QRS duration time window. Similarly, the maximum within the T wave duration time window was the peak T wave magnitude. The QRS-T angle was computed using Equations (4) through (6). The peak magnitudes of QRS were calculated as the maximum of the L2 norm of (V x , V y , V z ) within the QRS duration time window. Similarly, the maximum within the T wave duration time window was the peak T wave magnitude. The QRS-T angle was computed using Equations (4) through (6).
where QRS x , QRS y , and QRS z were the area under the curve of the QRS complex in the X, Y, and Z leads, respectively, and T x , T y , and T z were the area under the curve of the T wave in the X, Y, and Z leads, respectively. Several possible integration methods (for example, the Trapezoidal rule, the Simpson's rule, or the Simpson's 3/8) could be used to calculate the area [31]. In this implementation, we used the trapezoidal rule. We compared the extracted parameters using Pearson's correlation coefficient and Bland Altman limits of agreement.

Results
As part of the BO experiment, we trained 250 neural networks: 50 networks each for 1through 5-layer networks. Figure 9 shows the overall results of the BO for 1-to-5-layer neural networks to transform ECG leads from S12 to Frank XYZ leads. The following were the set of hyperparameters that resulted in the optimal Final RMSE: Number of Hidden Units = 47; Mini Batch Size = 27; Learning rate schedule = Piecewise; β 1 = 0.90025; β 2 = 0.90035; and Learning Rate = 0.062561.

Results
As part of the BO experiment, we trained 250 neural networks: 50 networks each for 1-through 5-layer networks. Figure 9 shows the overall results of the BO for 1-to-5-layer neural networks to transform ECG leads from S12 to Frank XYZ leads. The following were the set of hyperparameters that resulted in the optimal Final RMSE: Number of Hidden Units = 47; Mini Batch Size = 27; Learning rate schedule = Piecewise; 1 = 0.90025; 2 = 0.90035; and Learning Rate = 0.062561. The 1-layer network was found to have the lowest validation RMSE (0.0955 mV). Overall, there is an insignificant difference between the RMSE across the number of layers. The best RMSE and the worst RMSE differ by only 5 µV.

Comparison of Performance Metrics
The metrics for quantitative comparison of waveforms in this study were RMSE, R 2 , and Pearson correlation coefficient. Table 4 provides the results for all the methods of transformation implemented in this study.  The 1-layer network was found to have the lowest validation RMSE (0.0955 mV). Overall, there is an insignificant difference between the RMSE across the number of layers. The best RMSE and the worst RMSE differ by only 5 µV.

Comparison of Performance Metrics
The metrics for quantitative comparison of waveforms in this study were RMSE, R 2 , and Pearson correlation coefficient. Table 4 provides the results for all the methods of transformation implemented in this study.

Comparison of Extracted Diagnostic Parameters
As described in Section 2.5, three diagnostic features of the VCG waveform were computed from the actual and derived XYZ waveform data. The features computed from the actual data are treated as actual measurements, and those computed from the derived data are treated as measurements from a test device or methodology. In this case, the methodology is the transformation of ECG leads through a general and patient-specific personalized model. The metrics used for comparison are Pearson's correlation coefficient and the Bland-Altman limits of agreement [32]. We further present the effect size for comparison of VCG parameters from each transformation method and actual VCG. The effect size metrics include Cohen's U1, U3 [33], and common language effect sizes [34]. We also present t-test results with t-statistic and the associated p-value. The t-test p-values Eng 2023, 4 1348 in this case should be greater than 0.05 if we are to accept the null hypothesis that the difference in means is not significant (i.e., the transformation method yielded results for VCG parameters that were comparable or similar to those obtained from the actual VCG).

Peak QRS-Loop Magnitude
The personalized models showed the highest correlation coefficient values and the smallest limits of agreement, indicating that the derived peak QRS magnitudes were closest to the values computed from the actual data. Table 5 lists the correlation coefficients for the different transformation methods in descending order along with the statistical measures of comparison and effect sizes. Table 6 presents the Bland-Altman limits of agreement between QRS-loop magnitudes extracted from actual and derived VCG waveforms. Figure 10 presents the Bland-Altman plots for QRS-loop magnitude comparison.

Peak T-loop Magnitude
The personalized models show the highest correlations and the smallest limits of agreement. The generalized models perform comparably to the better performing transforms from the literature. Table 7 shows the methods' respective correlation coefficients sorted in descending order along with the statistical measures of comparison and effect sizes. Table 8 presents Bland-Altman Limits of Agreement between the peak T-loop magnitudes computed from the actual VCG waveforms and the derived waveforms across different methods of derivation. Figure 11 presents the Bland -Altman plots for the comparison of Peak T-loop magnitude. Table 6. Lists transformation methods and the Bland-Altman limits of agreement between QRS-loop magnitudes extracted from actual and derived VCG waveforms.

Peak T-loop Magnitude
The personalized models show the highest correlations and the smallest limits of agreement. The generalized models perform comparably to the better performing transforms from the literature. Table 7 shows the methods' respective correlation coefficients sorted in descending order along with the statistical measures of comparison and effect sizes. Table 8 presents Bland-Altman Limits of Agreement between the peak T-loop magnitudes computed from the actual VCG waveforms and the derived waveforms across different methods of derivation. Figure 11 presents the Bland -Altman plots for the comparison of Peak T-loop magnitude. Table 7. Correlation coefficients between the peak T-loop magnitudes computed from the actual   Table 8. Bland-Altman Limits of Agreement between the peak T-loop magnitudes computed from the actual VCG waveforms and the derived waveforms across different methods of derivation.

Method
Mean Differences

Mean Spatial QRS-T Angle
The personalized models show the highest correlations and the smallest limits of agreement. The generalized models perform comparably to the better performing transforms from the literature. Table 9 shows the methods' respective correlation coefficients sorted in descending order along with the statistical measures of comparison and effect sizes. Table 10 presents Bland-Altman limits of agreement between the mean QRS and T-loop spatial angle magnitudes computed from the actual VCG waveforms and the derived waveforms across different methods of derivation. Figure 12 presents Bland-Altman plots for comparison of spatial QRS-T angle.  Figure 11. Comparison of Bland-Altman limits of agreement for peak T-loop magnitudes across transformation methods. The red horizontal line indicates the mean of differences. Figure 11. Comparison of Bland-Altman limits of agreement for peak T-loop magnitudes across transformation methods. The red horizontal line indicates the mean of differences.

Discussion
The findings in this study indicate that personalized transformation models are preferable, but there are limitations to interpreting the results and practical considerations. The data set for this research is widely available, supporting further research and reproducibility of these results. However, the amount of data available is restricted to a small population that is not geographically or ethnically diverse. There is potential for overly optimistic results obtained in this study due to this aspect. Future studies should evaluate additional data sources from other geographic regions to confirm that these inferences are valid. Furthermore, there is only one diagnosis available per patient as the reason for hospitalization. ECG and VCG interpretations are unavailable. Comparisons on diagnostic yield and outcomes will require specific waveform interpretations, as well as longitudinal

Discussion
The findings in this study indicate that personalized transformation models are preferable, but there are limitations to interpreting the results and practical considerations. The data set for this research is widely available, supporting further research and reproducibility of these results. However, the amount of data available is restricted to a small population that is not geographically or ethnically diverse. There is potential for overly optimistic results obtained in this study due to this aspect. Future studies should evaluate additional Eng 2023, 4 1353 data sources from other geographic regions to confirm that these inferences are valid. Furthermore, there is only one diagnosis available per patient as the reason for hospitalization. ECG and VCG interpretations are unavailable. Comparisons on diagnostic yield and outcomes will require specific waveform interpretations, as well as longitudinal follow-up with patients so that we can evaluate how the clinical management of patients were impacted by the availability of VCG in addition to S12. In the absence of this information within the current data set, we could only evaluate the performance in terms of quantitative measures. We have presented effect sizes as statistical measures that could help with the evaluation of various transformation methods. However, there is no reasonable or equivalent comparison available in the literature thus far regarding an absolute interpretation of these results. The effect sizes can be compared across transformations in this study and reveal that generalized LSTM, personalized LSTM, and personalized linear regression perform better than other methods in that they have the least effect sizes when compared to the actual VCG in terms of values obtained for the VCG diagnostic parameters.
The findings indicate that personalized LSTM and personalized linear regression methods lead to nearly identical results with marginally better performance for personalized LSTM. Since the S12 leads cover the ventral plane of the body, it is plausible that the association between S12 and at least X and Y leads of VCG are nearly linear so that they can be derived using linear models. The comparisons between Z leads of VCG derived from the methods reveal a larger difference than X and Y leads. A future avenue of research may be to specifically explore Z-lead comparisons to understand if there is further scope for an improvement of performance with other lead transformation methods.
It is possible that neural network architectures other than LSTM may lead to better lead transformation performance. This study only explores LSTM and not the variants of LSTM. The choice of LSTM for this work was based on recent findings in the literature that demonstrated the use of this architecture to obtain acceptable results for the problem of lead transformations [12]. Alternatively, this work explores personalization and its impact on lead transformation performance and LSTM architecture was evaluated. Several architectures can be explored in this manner for future research.
We had chosen to down sample and filter the ECG waveforms as part of the preprocessing step. There could be different findings if the ECGs were retained at the 1 kHz sampling rate and without filtering. Since the entire data set was preprocessed in the same manner, and all transformation methods were evaluated on the same data, there is no expectation that there would be bias in the results presented herein. However, an empirical evaluation of the impact of pre-processing may be beneficial to explore in a future study with the evaluation of sampling rate and signal conditioning approaches as the goal.
From a practical perspective, implementing the personalized models would require acquiring 15-lead ECGs for each patient, which is not currently part of standard care and would result in added costs and work for healthcare professionals. Furthermore, the data available in this data set is not longitudinal because no recordings span a time frame before and after a significant cardiovascular event. Longitudinal data of this kind must be used to validate the hypothesis LSTM networks as trained have adequately inferred the lead transformations following the subject's anatomy.
Regarding the evaluation involving the extraction of VCG parameters, there is an underlying assumption that the algorithms were accurate. Therefore, we did not evaluate the performance of the algorithms alone. The use of the same algorithm for all data eliminates potential biases in comparisons, but further testing against a labeled VCG data set is necessary to assess the performance of the algorithms.

Conclusions
The personalized transformations performed better than generalized transformations in waveform comparisons and error performance of extracted diagnostic parameters from VCG waveforms. The use of personalized transformations for the derivation of VCG from S12 has the potential to improve and augment the diagnostic yield and accuracy of Eng 2023, 4 1354 a S12 interpretation. The differences between personalized LSTM and linear regression transformation were marginally in favor of personalized LSTM. There were no statistically significant differences in the performance between them. A study focused on outcomes for patients and diagnostic yield is needed to evaluate the clinical impact of using such an approach for the derivation of VCG from S12 and using it as part of the patient management plan for a broader population. On balance, the results in this study suggest that personalization should be the preferred approach for ECG lead derivations.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/eng4020078/s1, Table S1. Lists the coefficients for the linear transformation between Standard 12-lead ECG and Frank XYZ VCG; Table S2. Hyperparameter optimization variables, bounds, and sampling transformations.