Abstract
This research investigates the influence of dataset characteristics on the performance and generalization capabilities of deep learning models, on ECG data. The study evaluates multiple subsets of the TNMG dataset with varying levels of curated characteristics to assess their impact on model performance. Additionally, an attention mechanism is introduced to enhance model accuracy and generalization. The experimental results reveal that models trained on balanced subsets and incorporating the attention mechanism consistently outperform those trained on unbalanced data or without attention, emphasizing the critical importance of dataset balance and attention mechanism for achieving improved model performance.
Surprisingly, the largest ECG dataset, TNMG, proved less effective in generalization than smaller, curated subsets. The study demonstrates that a well-balanced and thoughtfully curated dataset, combined with the attention mechanism, can lead to competitive model performance, even with a significantly smaller size.
This research on ECG data underscores the critical importance of dataset curation, balance, and attention mechanisms in biomedical machine learning. It highlights that well-balanced, thoughtfully curated datasets with attention mechanisms can outperform larger, unbalanced datasets, challenging conventional notions and offering potential advancements in medical data analysis and patient care.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This study did not receive any funding
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
This paper employs publicly accessible datasets such as CPSC and SNH, in addition to the TNMG dataset which can be obtained by requesting access from the dataset owner. The dataset is not publicly available, but access can be granted upon request with approval from the dataset owner.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Footnotes
(email: {zhua2351{at}sydney.edu.au, leyu3109{at}sydney.edu.au, lher7073{at}sydney.edu.au, duy.truong{at}sydney.edu.au, omid.kavehei{at}sydney.edu.au}).
(e-mail: antonio.horta.ribeiro{at}it.uu.se)
The manuscript has received substantial revisions in both its figures and discussions.
Data Availability
All data produced in the present study are available upon reasonable request to the authors