Real-time classification of causes of death using Artificial Intelligence: sensitivity analysis

Background: In 2021, European Union registered over 365,000 excess deaths, with over 16,000 excess deaths in Portugal. The Portuguese Directorate-General of Health (DGS) has developed a deep neural network – AUTOCOD – that codifies primary causes of death by analyzing the free text in the physicians' death certificates (DC). Although the performance of AUTOCOD has already been demonstrated, it was not clear if this performance was the same over time, especially during excess mortality periods. Objective: Determine the sensitivity of AUTOCOD for classifying the underlying cause of death compared with manual coding to ascertain the specific causes of death, in periods of excess mortality. Methods: We included all the DC between 2016 and 2019. We evaluated the performance of AUTOCOD through a confusion matrix, comparing ICD-10 classifications of DC by AUTOCOD with those from the human coders at DGS (gold-standard). Next, we compared the periods without excess mortality with periods of excess, severe and extreme excess mortality. Lastly, we repeated the analyses for the three most common ICD-10 chapters, targeting classification at the block level. Results: AUTOCOD showed high sensitivity (≥0.75) for ten ICD-10 chapters studied, with values above 0.90 for the more prevalent chapters (II – neoplasms; IX – diseases of the circulatory system; X – diseases of the respiratory system). These high sensitivity values show no significant differences when comparing the periods without excess mortality with periods of excess, severe and extreme excess mortality. When considering the ICD-10 block classification of the three most common ICD-10 chapters, AUTOCOD again performed well, showing high sensitivity (≥0.75) for 13 ICD-10 blocks, with no significant differences between periods without excess mortality and periods with excess mortality. Conclusions: Our results suggest that even during periods of excess and extreme excess mortality, the performance of AUTOCOD is not affected by a potential loss in text quality due to pressure on health services. Thus, AUTOCOD can be reliably used for real-time specific-cause mortality surveillance even in extreme excess mortality. Funding: This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.


Introduction
In 2021, over 365,000 excess deaths were registered in the European Union, with more than 16,000 attributable to Portugal.Although most of these excess deaths are possibly related to the COVID-19 pandemic, excess deaths are generally attributable to preventable causes, making a case for the importance of real-time specific-cause mortality surveillance and the subsequent timely and appropriate Public Health response and suitable health policies in periods of excess mortality.
The Portuguese Directorate-General of Health (DGS) is responsible for processing data from the Death Certificate Information System (SICO) and ensuring the epidemiological surveillance of mortality 1 .SICO all-cause mortality data is automatically analyzed and can be publicly accessed 2 .However, the analysis of death certificates (DC) requires the manual coding of the primary causes of death, according to the 10th revision of the International Statistical Classification of Diseases and Health-Related Problems (ICD-10) 3 .This manual coding is a resource-consuming task that hinders realtime cause-specific mortality surveillance.
Excess mortality is defined by the World Health Organization (WHO) as mortality above what would be expected.It allows for assessing the magnitude of a potential Public Health crisis, by checking the additional deaths, compared to a reference period and the subsequent in-depth analysis of its causes 4,5 .
Excess mortality can be estimated in several ways.In Portugal, a period of excess mortality is defined as a consecutive period that starts with two observed numbers of deaths above the upper 95% confidence limit of the baseline or with only one observed number of deaths above the upper 99% confidence limit of the baseline, and it ends with two consecutive values below this limit 6 .This methodology is aligned with the practice of the European Mortality Monitoring Project (EuroMOMO), which allows for the detection and measurement, in real-time, of periods of excess mortality from all causes, as a result of threats to Public Health in Europe 7 .
Most excess mortality surveillance systems, such as EuroMOMO or national systems, are based on all-cause mortality surveillance to ensure real-time surveillance.However, in many countries, information on cause-of-death is not readily available since it requires a human step to code the basic cause-of-death, delaying surveillance and monitoring of causespecific mortality.For instance, in Portugal, the manual establishment of the primary causes of death for the previous year is completed by March of the following year 8,9 .
To overcome this problem, Portugal developed a deep neural network called AUTOCOD 10,11 , which allows for presuggesting primary causes of mortality based on historical data of DC (except neonatal and perinatal mortality) with high sensitivity, specificity and accuracy.AUTOCOD can also analyse data from autopsy reports and clinical bulletins (in deaths that occur in healthcare facilities).Ultimately, the developed algorithm increased the productivity of coders, sped up the issuance of results and information, and ensured near real-time mortality surveillance 10,11 .
To our knowledge, there is no widespread dissemination of complex Artificial Intelligence algorithms that can suggest underlying causes of death, by free text analysis in the DC, such as AUTOCOD 12 .
This study aimed to determine the sensitivity and specificity of AUTOCOD for classifying the underlying cause of death compared with manual coding to ascertain the specific causes of death, in periods of excess mortality.
AUTOCOD has already proven high sensitivity, specificity, and accuracy in periods without excess mortality.However, it was unclear whether this performance was maintained in periods of excess mortality, where the recording of free text in the DC could change due to the pressure felt in health services and the need to respond to a more significant number of requests for DC.A satisfactory performance by AUTOCOD could pave the way for its implementation as a real-time surveillance tool, to monitor cause-specific mortality even during periods when the national health system experiences severe pressure. 12,13

Study population
In this study, we included all DC registered in Portugal's SICO, starting from January 1, 2016, until August 8, 2019.
We excluded the DC referring to neonatal, perinatal and maternal mortality because the AUTOCOD algorithm is not trained for these underlying causes of death 10 .Each DC was classified according to ICD-10 by two different methods: manually, by the human coders at DGS (gold-standard) or automatically by the AUTOCOD.

Study design and datasets
The methods behind the construction of the AUTOCOD algorithm are explained in detail in previous publications, and the algorithm was initially trained and tested with a dataset different from the one chosen for the current study 10,11 .The manual codification of causes of death adheres to the WHO Nomenclature Regulations specified in the ICD-10 and also uses the ICD rules for selecting the underlying cause of death for the primary cause of death, by international rules 3 .The DC dataset was then linked with two dictionaries of ICD-10 to translate blocks and chapters' codes to text descriptions.The DC dataset was also linked to the national surveillance all-cause mortality dataset 2 , which defines the baseline for expected deaths according to the EuroMOMO methodology 14 and the daily count of observed deaths.

Excess mortality definition
Using this dataset, we defined the periods of time where excess mortality was observed, according to EuroMOMO's Zscore for excess mortality and the rules of Westgard 15 , i.e. we consider excess mortality when there are two consecutive days with a Z-score above the limit at 95% of the baseline or just one day above 99%.The period of excess mortality ends with two consecutive days below the limit of 95% of the baseline.
We also defined two metrics for periods of severe and extreme excess mortality.These are two consecutive days with a Z-score above the limit of four standard deviations (+4 SD) and six standard deviations (+6 SD).The Westgard functions used to classify the different periods can be found in the Supplementary Materials.

Statistical analysis
To obtain the multi-class Confusion Matrix we used the function "confusionMatrix" of the "caret" package (v.6.0-90) in RStudio 16,17 .In a multi-class problem such as classifying ICD-10 chapters and blocks, "confusionMatrix" will show a set of "one-versus-all" results.For example, in a three-class problem, the sensitivity of the first class is calculated against all the samples in the second and third classes (and so on).The resulting confusion matrix is a summary of prediction results on a classification problem.
The number of correct and incorrect predictions are summarized with count values and broken down by each class.The confusion matrix shows how a classification model, such as AUTOCOD, is confused when it makes predictions.These numbers are then organized into a table, or a matrix.Each row of the matrix corresponds to a predicted class (i.e., AUTOCOD).Each column of the matrix corresponds to an actual class (i.e., human coders at DGS).
The counts of correct and incorrect classifications are then filled into the table.The total number of correct predictions for a class go into the expected row for that class value and the predicted column for that class value.In the same way, the total number of incorrect predictions for a class go into the expected row for that class value and the predicted column for that class value.
Finally, we performed a sensitivity analysis (also with the R package "caret") to compare the classification results obtained with the AUTOCOD algorithm (index test) with the classification made by human coders (gold-standard) 18 .
This allowed us to obtain the number of true positives and false positives as well as additional metrics, such as sensitivity (recall), specificity, accuracy, precision (positive predictive value) and the F1-score 11 .This step was performed over time, including a comparison between excess mortality periods and no excess mortality periods and a comparison between extreme excess mortality periods and no excess mortality periods, both by chapter and block classification levels of the ICD-10 11 .The formulas used for all these performance metrics can be found in the Supplementary Materials.
In order to assess the quality of AUTOCOD, we opted to present the weighted average of performance metrics, such as sensitivity, precision, and F1 scores, by taking the mean of all class performance metrics, while considering each class's number of actual occurrences of the class in the dataset.The "weight" refers to the proportion of each class's actual occurrences in the dataset relative to the sum of all occurrences and the full formula for this calculation of weighted average is in supplementary material.This choice was made as oppose to presenting the macro-average of performance metrics (i.e., macro-averages assign an equal importance to each chapter or block, thus calculating the arithmetic mean of performance metrics) 10 , since the latter methodology would artificially increase the importance in the average of the rare or infrequent cause of death chapters/blocks.
In the dataset, one death certificate was not properly codified by the AUTOCOD, so the ICD-10 classifications of that DC both from AUTOCOD and DGS were excluded.

Description of the Dataset
The dataset includes 330,098 death certificates, each classified twice, meaning we have all death certificates classified by human coders and classified by AUTOCOD.The three most common ICD-10 chapters classified by human coders, are chapter IX -Diseases of the circulatory system (97,420/ 29.51%), chapter II -Neoplasms (85,837/ 26.00%), and chapter X -Diseases of the respiratory system (40,202/12.18%).A more extensive and detailed descriptive analysis of this dataset can be found in the Supplementary Materials, including desegregation of DC by year, ICD-10 chapter or block and time.

Confusion Matrix and Performance Metrics for ICD-10 Chapter
The "caret" package provides the Confusion Matrix, which allows for evaluating AUTOCOD's performance by calculating some performance metrics.Some other performance metrics calculated for AUTOCOD can be found in the Supplementary Material.

XV -Pregnancy, childbirth and the puerperium
As presented in Some other performance metrics calculated for AUTOCOD can be found in the Supplementary Material.
Table 3, specificity in all ICD-10 chapters was over 0. Considering the weighted average of all the chapters, the results we obtained for performance metrics of the AUTOCOD are presented in Table 4.For sensitivity, positive predictive value, and F1 score, there is no difference between periods without excess mortality and periods of excess mortality (<0.01), there is a decrease of 0.01 from periods without excess mortality to periods with severe excess mortality (+4 SD), and there is a decrease of 0.04 when comparing the weighted-average of periods without excess mortality and periods with extreme excess mortality (+6 SD).Weightedaveraged 0.12), XVI (0.07), and XVII (0.07).For the three most common chapters, the differences are 0.01 (chapter II -Neoplasms), 0.01 (chapter IX -Diseases of the circulatory system), and 0.00 (chapter X -Diseases of the respiratory system).When comparing the difference between sensitivity values of AUTOCOD between periods without excess mortality and periods of extreme excess mortality (Z-score above six standard deviations), the biggest differences occur in chapter XVII -Congenital malformations, deformations and chromosomal abnormalities (0.19), III -Diseases of the blood and blood-forming organs and certain disorders involving the immune system (0.17), XIII -Diseases of the musculoskeletal system and connective tissue (0.10), and XII -Diseases of the skin and subcutaneous tissue (0.08).For the three most common chapters, the differences are 0.00 (chapter II -Neoplasms), 0.00 (IX -Diseases of the circulatory system), and 0.00 (X -Diseases of the respiratory system).

The differences of the performance measures of AUTOCOD between periods without excess mortality and periods of excess mortality
or extreme excess mortality, can be seen across  Difference between periods without excess mortality and periods with excess mortality or severe mortality or extreme excess mortality, by chapter Considering the differences between periods of excess mortality and periods without excess mortality, it is important to analyze in which blocks do the biggest differences occur.

Block
According to Table 8, the biggest differences in the sensitivity of AUTOCOD between periods without excess mortality and periods of excess mortality occur in block J00-J06 -Acute upper respiratory infections (0.34), J30-J39 -Other diseases of upper respiratory tract (0.28), and I95-I99 -Other and unspecified disorders of the circulatory system (0.08).
As for the difference in sensitivity between periods without excess mortality and periods of severe excess mortality (+4 SD), the biggest differences occur in block J00-J06 -Acute upper respiratory infections (0.41), J85-J86 -Suppurative and necrotic conditions of lower respiratory tract (0.23), J30-J39 -Other diseases of upper respiratory tract (0.20), and I05-I09 -Chronic rheumatic heart diseases (-0.22).The biggest differences in the sensitivity of AUTOCOD between periods without excess mortality and periods of extreme excess mortality (+6 SD) occur in blocks J00-J06 -Acute upper respiratory infections (0.41), J85-J86 -Suppurative and necrotic conditions of lower respiratory tract (0.31), and I05-I09 -Chronic rheumatic heart diseases (-0.26).The differences of the performance measures of AUTOCOD between periods without excess mortality, periods of excess mortality and periods of extreme excess mortality, according to ICD-10 blocks, can be across Figure 3.Additional AUTODOC performance comparisons between periods can be found in the supplementary material.(chapter II -neoplasms, IXdiseases of the circulatory system, and Xdiseases of the respiratory system, that together account for 67.69% of all human-codified causes of death).The weighted average of sensitivity in the ICD-10's chapter analysis, showed no difference between periods without excess mortality and periods of excess mortality, a difference of 0.01 for periods of severe excess mortality (+4 SD) and a difference of 0.04 for periods of extreme excess mortality (+6 SD).As for the ICD-10's block analysis, it showed a difference of 0.01 for the weighted average of sensitivity between periods without excess mortality and periods of excess mortality, and periods of severe and extreme excess mortality (at the +4 SD and +6 SD thresholds).
In the different time periods considered for the ICD-10's chapter analysis, AUTOCOD showed a consistent good performance, demonstrating a sensitivity (or recall), a positive predictive value (or precision), and a F1-score as high as 0.88 for periods without excess mortality and periods of excess mortality and as low as 0.84 in periods of extreme excess mortality (+6 SD).When we consider only the most common chapters (chapter II -neoplasms, IXdiseases of the circulatory system, and Xdiseases of the respiratory system), sensitivity ranges from 0.94 to 0.95 in chapter II, 0.91 in chapter IX and 0.89 to 0.90 in chapter X, in the different periods analyzed.The same happens with the positive predictive value, that ranges from 0.96 to 0.98 in chapter II, 0.90 to 0.92 in chapter IX and 0.83 to 0.86 in chapter X.As for the F1-score, the performance of AUTOCOD is 0.96 in chapter II, 0.91 chapter IX and 0.86 to 0.88 in chapter X.
AUTOCOD presented high specificity and high negative predictive values in all the analysis that were performed.This was expected since the number of true negatives was consistently much higher than the true positives.This is not a characteristic of AUTOCOD itself but a result of our handling of the sample and our interpretation of the question at hand as a classification problem with a one-versus-all solution.This is a method widely used for multiple output classes classification problemsin our case, the individual ICD-10 chapters or blocks are handled as if they were in a binary model, thus assessing each class individually against all the other classes in the model.
It should be noted that the chapter XVII (symptoms, signs and abnormal clinical and laboratory findings, not elsewhere specified) consistently presented high performance metrics in AUTOCOD.This chapter does not translate a correct certification of cause of death, but it could imply that when human coders have difficulties classifying the cause of death, so does the AUTOCOD.
These results are aligned with previous studies using AUTOCOD 10,11 and, in general, with the literature for deep neural networks applied to the automatic classification of death certificates 12,23,24 .Falissard and colleagues developed a deep neural network for automated coding of the underlying cause of death with a test accuracy of 0.978 (95% CI 0.977-0.979) 12and an F-measure value of 0.952 (95% CI 0.946-0.957) 23.Della Mea and colleague's proposed approach for automated coding of causes of death had an accuracy of 0.990 (95% CI 0.990-0.991)and a macro-averaged accuracy and F1 scores of respectively 0.974 and 0.968.Similarly to our study, Della Mea et al found that accuracy is low for chapters with very rare death causes, and can, therefore, be ignored. 24owever, this is the first time, to the best of our knowledge, that a deep neural network that classifies basic causes of death was evaluated while comparing its performance across different timeframes, according to their excess mortality rates.
Automatic classification of death certificates relies on Natural Language Processing (NLP) techniques and algorithms.
NLP can translate free-text written by the physician who certified the death, into classification codes, based on the ICD-10.However, this process depends on the text quality of the analyzed death certificates.By text quality we mean how successfully we can automatically classify it, retrieve it or extract information from it 25 .Thus, text quality does not involve a single aspect but is a combination of numerous criteria, including spelling, grammar, organization, informative nature, and page layout 26 .The extraction of these attributes can become problematic in low-quality texts (poor grammar, many abbreviations, short sentences), and this is a known problem in medical and clinical texts, such as patient records or death certificates 26 .The performance of systems that rely on attributes of text quality, such as NLP, affects the overall performance of the algorithmsa text of bad quality may result in bad quality prediction results.In order to overcome this limitation, ever since AUTOCOD's development a processing layer was added to the neural network that has the ability to always read words in text fields as the closest word the model knows (e.g., for the word Alzheimer it currently identifies more than 25 ways of misspelling it).This processing layer can therefore help to minimize text field errors/abbreviations in periods of excess mortality [27][28][29] .
Our results suggest that, even in periods of excess and extreme excess mortality, when the volume of deaths and the pressure on health services might increase, with a consequent impact on doctors that certify deaths and a potential impact on the quality of the text in the DC, AUTOCOD's performance remains unhindered.

Limitations
An important limitation to this study is that the human coders have access to the automatic classification of the DC by the AUTOCOD, meaning that the gold standard we use in this research might be biased by the one algorithm we are trying to evaluate.However, this implementation only entered in production on the July 26 of 2019, meaning for the majority of the dataset used in this study, the manual classification was unbiased.
Additionally, there is the matter of ICD-10 code ambiguity.This is a known limitation of ICD-10, both for human coders and automatic algorithms of classification, that can be explained by the sometimes-discrete differences between codes for similar causes of death.This might explain the difference on sensitivity between, for example, respiratory blocks such as J00-J06 (Acute upper respiratory infections) and J09-J18 (Influenza and pneumonia), with the latter presenting a less ambiguous cause of death, when compared with the first, both for human classification and automatic classification.These unspecified codes are not necessarily an error rate, but an indicator of the completeness of clinical information of DC where sufficient clinical information is not known or available to assign a more specific code.In the case of human coders, it is common that they look for more clinical information on electronic health records, but the AUTOCOD is restricted to the information included in the DC.This stresses the importance of well filled and detailed DC by the physician that certifies the death, even in periods of excess mortality.
The human coders, that we set as our ground-truth, are not mistake free.Current research puts reliability for human coders at around 70 to 89% (reliability is a measure for calculating agreement between coders and consistency of each coder individually) 30 .These performance scores can be, in part, explained by the use of different codes for similar diseases.
Another possible limitation, known in artificial intelligence algorithms, is the generalization of our results to other countries 31 .This question of model transferability requires further studies, but we feel confident that our results can be generalized to other algorithms that rely on NLP for automatic classification, without deep impact on the model's performance, even in periods of excess mortality.

Strengths
In Portugal, Law No. 15/2012 of April 3 2012 established the Information System for Death Certificates (SICO), a mortality information system based on the electronic registration of DC 32 .Since then, SICO became a widespread tool used by doctors nationally and, therefore, a well-established source of data and information related to mortality and an international example for timeliness of mortality statistics 1 .
AUTOCOD was built based on the already disseminated existence of DC in electronic format and has since then been validated as an important tool for automatic assignment of ICD-10 codes for causes of death 10 .However, this validation never considered time differences that might affect the quality of DC and, consequently, the performance of AUTOCOD.
The method we used for evaluating the performance of AUTOCOD during excess mortality and extreme excess mortality periods is a known method for comparison of the performance of a given index-text with a given groundtruth/gold standard, making a case for the importance of evaluating algorithms and models in different periods and in the ever-changing environment that might affect the overall performance of models.
In a completed DC, AUTOCOD can be used to accurately classify basic causes of death, in real-time, even in periods of excess mortality, attesting that deep neural networks are robust to eventual changes in the underlying quality of the text.Furthermore, by defining a baseline from the past (and Portugal has digital DC data going back to 2014), we can detect in real-time, with high sensitivity, changes in mortality and periods of excess mortality, without the need to wait for human classification of cause of death, especially in the more common and the less ambiguous causes of death.
Finally, with this algorithm, we can use our data to predict excess deaths that rely on seasonality, such as influenza and pneumonia.

Implications of our work
Our work makes a case for using AUTOCOD for real-time mortality surveillance, by ICD-10 code and it can be further validated by other countries that wish to train their own neural networks for medical and clinical text classification.Our research also makes the case for the importance of auditing, evaluating and consistent monitoring of artificial intelligence algorithms, in order to identify potential barriers, strengths and opportunities 33 .
As the AUTOCOD algorithm is robust, it can be used to classify the underlying causes of death in periods of excess Further investigations should be carried out, including a comparison of AUTOCOD with other automated coding systems and a new evaluation of the behavior of AUTOCOD during the periods of excess mortality caused by the COVID-19 pandemic. 12,13,24To strengthen the coding practices, it would also be important to conduct a reliability study inter coders at DGS.

Conclusions
This article makes the case that deep neural networks are powerful tools for automatic classification of primary causes

Disclosure of funding and conflicts of interest
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
The authors declare no conflict of interest.

Data sharing
Data is available upon reasonable request.

Informed consent statement
Patient consent was waived because data were de-identified and used for Public Health research purposes.

Figure 1 -
Figure 1 -Flowchart of the study population inclusion criteria As expected, death certificates for periods of excess mortality are fewer (n=186,834; 93,417 for each source, i.e., 28.30% of the total DC) than the death certificates for periods without excess mortality (n=473,362; 236,681 for each source, i.e., 71.70% of the total DC).When considering the periods of severe and extreme excess mortality, either for Z-score above +4 standard deviations (n=60,220; 30,110 for each source) or Z-score above +6 standard deviations (n=12,480; 6,240 for each source), the DC are even fewer.Considering only the three most common chapters of the dataset (chapter II, IX and X), we performed the same analysis for the classification of ICD-10 blocks, which accounts for 223,459 DC (67.69% of the total DC) throughout the time period.The five most common blocks classified in DC are C00-C97 (Malignant neoplasms), I60-I69 (Cerebrovascular diseases), I30-I52 (Other forms of heart disease), I20-I25 (Ischemic heart diseases) and J09-J18 (Influenza and pneumonia).
97 for the periods without excess mortality.The highest values of sensitivity (or recall) are for chapter II -Neoplasms (0.95), XVIII -Symptoms, signs and abnormal clinical and laboratory findings, not elsewhere specified (0.93), and IX -Diseases of the circulatory system (0.91).Considering the positive predictive value (or precision), the highest values are for chapter XVI -Certain conditions originating in the perinatal period (1.00), II -Neoplasms (0.98), IX -Diseases of the circulatory system (0.92).The highest F1-scores are for chapter II -Neoplasms (0.96), IX -Diseases of the circulatory system (0.91), and XVIII -Symptoms, signs and abnormal clinical and laboratory findings, not elsewhere specified (0.90).Specificity in all ICD-10 chapters was over 0.96 for the excess mortality periods.The highest values of sensitivity (or recall) are for chapter II -Neoplasms (0.95), XVIII -Symptoms, signs and abnormal clinical and laboratory findings, not elsewhere specified (0.93), and IX -Diseases of the circulatory system (0.91).Considering the positive predictive value (or precision), the highest values are for chapter II -Neoplasms (0.97), IX -Diseases of the circulatory system (0.91), and XVIII -Symptoms, signs and abnormal clinical and laboratory findings, not elsewhere specified (0.88).The highest F1-scores are for chapter II -Neoplasms (0.96), IX -Diseases of the circulatory system (0.91), and XVIII -Symptoms, signs and abnormal clinical and laboratory findings, not elsewhere specified (0.90).Specificity in periods with severe excess mortality (+4 SD) was over 0.96 in all ICD-10 chapters.The highest values of sensitivity (or recall) are for chapter II -Neoplasms (0.94), XVIII -Symptoms, signs and abnormal clinical and laboratory findings, not elsewhere specified (0.92), IX -Diseases of the circulatory system (0.91).Considering the positive predictive value (or precision), the highest values are for chapter II -Neoplasms (0.97), IX -Diseases of the circulatory system (0.91), and XVIII -Symptoms, signs and abnormal clinical and laboratory findings, not elsewhere specified (0.88).The highest F1-scores are for chapter II -Neoplasms (0.96), IX -Diseases of the circulatory system (0.91), and XVIII -Symptoms, signs and abnormal clinical and laboratory findings, not elsewhere specified (0.90).For the periods with extreme excess mortality (+6 SD), specificity in all ICD-10 chapters was over 0.96.The highest values of sensitivity (or recall) are for chapter II -Neoplasms (0.95), IX -Diseases of the circulatory system (0.91), and XVIII -Symptoms, signs and abnormal clinical and laboratory findings not elsewhere specified (0.90).Considering the positive predictive value (or precision), the highest values are for chapter II -Neoplasms (0.96), IX -Diseases of the circulatory system (0.90), and XVIII -Symptoms, signs and abnormal clinical and laboratory findings, not elsewhere specified (0.88).The highest F1-scores are for chapter II -Neoplasms (0.96), IX -Diseases of the circulatory system (0.91), XVIII -Symptoms, signs and abnormal clinical and laboratory findings, not elsewhere specified (0.89).

Figure 2 .
Figure 2. The absolute values of observations for each of the periods analyzed and additional comparisons of AUTOCOD performance measures can be foun on the supplementary material.

Figure 2 -
Figure 2 -Comparison between performance metrics of AUTOCOD during periods of excess mortality, severe excess mortality and extreme excess mortality and periods without excess mortality, for ICD-10 chaptersCaption: I -Certain infectious and parasitic diseases; II -Neoplasms; III -Diseases of the blood and blood-forming organs and certain disorders involving the immune system; IV -Endocrine, nutritional and metabolic diseases; V -Mental and behavioural disorders; VI -Diseases of the nervous system; VII -Diseases of the eye and adnexa; VIII -Diseases of the ear and mastoid process; IX -Diseases of the circulatory system; X -Diseases of the respiratory system; XI -Diseases of the digestive system; XII -Diseases of the skin and subcutaneous tissue; XIII -Diseases of the musculoskeletal system and connective tissue; XIV -Diseases of the genitourinary system; XV -Pregnancy, childbirth and the puerperium; XVI -Certain conditions originating in the perinatal period; XVII -Congenital malformations, deformations and chromosomal abnormalities; XVIII -Symptoms, signs and abnormal clinical and laboratory findings, not elsewhere specified; XIX -Injury, poisoning and certain other consequences of external causes; XX -External causes of morbidity and mortality

Figure 3 -
Figure 3 -Comparison between performance metrics of AUTOCOD during periods of excess mortality and periods without excess mortality, for ICD-10 blocksCaption: C00-C97 -Malignant neoplasms; D00-D09 -In situ neoplasms; D10-D36 -Benign neoplasms; D37-D48 -Neoplasms of uncertain or unknown behaviour; I05-I09 -Chronic rheumatic heart diseases; I10-I15 -Hypertensive diseases; I20-I25 -Ischaemic heart diseases; I26-I28 -Pulmonary heart disease and diseases of pulmonary circulation; I30-I52 -Other forms of heart disease; I60-I69 -Cerebrovascular diseases; I70-I79 -Diseases of arteries, arterioles and capillaries; I80-I89 -Diseases of veins, lymphatic vessels and lymph nodes, not elsewhere classified; I95-I99 -Other and unspecified disorders of the circulatory system; J00-J06 -Acute upper respiratory infections; J09-J18 -Influenza and pneumonia; J20-J22 -Other acute lower respiratory infections; J30-J39 -Other diseases of upper respiratory tract; J40-J47 -Chronic lower respiratory diseases; J60-J70 -Lung diseases due to external agents; J80-J84 -Other respiratory diseases principally affecting the interstitium; J85-J86 -Suppurative and necrotic conditions of lower respiratory tract; J90-J94 -Other diseases of pleura; J95-J99 -Other diseases of the respiratory system mortality, with no need to wait for manual coding, which allows for adequate real-time cause-specific mortality surveillance, timely assessment of risks to Public Health and definition of priorities and planning of responses, both in periods without excess mortality and in periods with excess mortality.This cause-specific mortality surveillance in realtime is something that is not carried out widely throughout the world and might benefit from further investigation and real-world intervention.This investigation is a step forward in Portugal for the widespread use of the classification of specific causes of death by the AUTOCOD, with a renewal confidence in its results, regardless of the presence of excess mortality, and for the implementation of targeted public health interventions and practices.
of death according to ICD, even during excess mortality periods.Our work could potentially further the use of deep neural networks to facilitate automatic clinical codification, such as diseases, medical procedures or death certificates.Also, it may contribute as a staple for real-time monitoring and surveillance of Public Health threats and problems, allowing for timely action.More broadly, this study highlights the importance of Artificial Intelligence algorithms as an advisory tool for Public Health policies and measures.

Table 1 -
Description of the study population by excess mortality and type of death certificate coding

Table 2 -
Description of the study population for the three most common chapters (VII, IX and X), for all the periods analyzed

Table 3 -
Performance Metrics of AUTOCOD, by chapter, for all the periods analyzed

Table 4 -
Average performance metrics for different periods, for ICD-10's chapter classification of AUTOCOD

Table 8 -
Comparison between sensitivity values of AUTOCOD, depending on period (total, excess mortality or without excess mortality periods), by ICD-10 block