“Longevity risk management through Machine Learning: state of the art”

Longevity risk management is an area of the life insurance business where the use of Artificial Intelligence is still underdeveloped. The paper retraces the main results of the recent actuarial literature on the topic to draw attention to the potential of Machine Learning in predicting mortality and consequently improving the longevity risk quantification and management, with practical implication on the pricing of life products with long-term duration and lifelong guaranteed options embedded in pension contracts or health insurance products. The application of AI methodologies to mortality forecasts improves both fitting and forecasting of the models traditionally used. In particular, the paper presents the Classification and the Regression Tree framework and the Neural Network algorithm applied to mortality data. The literature results are discussed, focusing on the forecasting performance of the Machine Learning techniques concerning the classical model. Finally, a reflection on both the great potentials of us-ing Machine Learning in longevity management and its drawbacks is offered.


INTRODUCTION
The availability of large datasets and the advances in Artificial Intelligence (AI) to analyze and extract information from the data represent a challenge for the insurance sector.It is well known that the insurance industry is a data-driven business, so AI can have significant consequences on its processes and decisions.A survey conducted by Deloitte (2018) and the European Financial Management Association shows that the activities in which AI could have the strongest influence vary substantially from one sector to another.In the insurance sector, 56% of the respondents consider the Risk Management the area in which AI has the greatest impact, versus 29% in the banking sector.
Regarding the insurance business, longevity risk management is presented as part of the wider Enterprise Risk Management (ERM) of an insurance company (Pitacco, 2020).The ERM is a comprehensive approach that goes from risk identification to risk assessment (that goes from product design to pricing to natural hedging techniques), capital allocation, and risk monitoring.Certainly, there are areas of risk management where the use of AI is more developed, like the risk monitoring of the underwriting process, and others where the potential benefits of automated predictive modeling have not yet been fully exploited.This is the case of the management of longevity risk in life business, to assess which many insurers still rely on traditional methods and make predictions resorting to classical demographic frameworks based on extrapolative models.Some stochastic longevity models have been proposed starting from the classical financial literature on interest rates, due to similarities between the process of mortality and interest rate (e.g., Milevsky & Promislow, 2001;Cairns et al., 2006;Biffis, 2004;Dahl, 2004;Schrager, 2006).Other models exploit traditional statistical tools to fit the real data model using the Singular Value Decomposition of the matrix of mortality rate by age and time or the Principal Component Analysis.The Lee-Carter model (Lee & Carter, 1992) is widely recognized as the cornerstone of mortality modeling and forecasting; it introduces a model for central death rates involving both age and time-dependent terms.The Renshaw-Haberman model (Renshaw & Haberman, 2003, 2006), for the first time, introduces a cohort parameter to catch the observed variations in mortality among individuals belonging to different cohorts.The assumption that the error term of the model is described by white noise is violated because the force of mortality at older ages rather shows a higher variability than at younger.For this reason, Brouhns et al. (2002) have extended the basic model by describing the number of deaths according to a Poisson distribution.Other approaches use penalized splines to smooth mortality and derive future mortality patterns (Currie et al., 2004).Generally, all these models have been applied to national data consisting of yearly observations.However, actuaries are more interested in deriving the underlying mortality of given portfolios.An important difference between mortality data aggregated at the national level and specific mortality relates to the range of the observation period: often, a given portfolio data does not contain observations during very long periods as in the national dataset, though it contains many other descriptive variables.
In Section 1, the advancements of the recent literature on longevity risk management with the introduction of Machine Learning techniques applied to mortality forecast are retraced to improve the classical models' predictability.Section 2 presents the family of the generalized age-period cohort models and the main accuracy measures.Section 3 retraces the method presented in the literature on mortality modeling with ML: a brief overview of the Classification and Regression Trees (CART) approach applied to mortality and a global perspective on the NN in mortality modeling is provided.An introduction to the basic idea behind NN architecture is presented, and the benefits of NN in both integration and full replacement of canonical models are explored.Sections 4 and 5 are devoted to describing and discussing the results and the benefits of using ML techniques as complementary to standard mortality models.This way of operating is likely to please the longevity risk managers who are unwilling to use not immediate explainable algorithms.The last section is devoted to the findings.

LITERATURE REVIEW
Recently, AI in general and Machine Learning (ML) in particular have appeared in actuarial research and practice.Moreover, while the scientific productions and the applied researches boast an increasing number of contributions in non-life insurance (Wüthrich et al., 2019;Ferrario et al., 2018;Gabrielli et al., 2019;Noll et al., 2020), the life insurance sector is suffering the unwillingness of the demographers to replace the traditional models with a sort of "black boxes", referring to how ML algorithms operate.Nevertheless, the first results of the application of ML techniques to mortality forecasts and longevity risk management have been presented during the last years.At first, some early unsupervised methods for mortality analysis have been introduced with application to different medicine fields; lately, they have been exploited by demographers (Carracedo et al., 2018)  A common component of the underwriting process is the estimation of individual-level mortality risk.Traditionally, this is performed manually using human judgment and point-based systems, subjected to inconsistency, limiting the ability to price the products efficiently.On the contrary, the availability of wide historical data sets represents a challenge for ML to support real-time automated decision-making deepening the available data.
ML techniques permit us to integrate a stochastic model with a data-driven approach.In addition to improving the underwriting process, a more accurate longevity risk quantification can sustain the reinsurance of this "toxic risk" (Blake et al., 2006) and the development of the longevity capital market.The contribution given by the use of hedging assets designed for longevity management has been highlighted in Cocco and Gomes (2012).Different longevity assets have been designed: firstly, mortality and longevity bonds, afterward forwards and swaps.The significant reduction in the forecasting error reached through the application of ML and DL techniques is particularly useful in the pricing of these products.The analysis of the impact of mortality rates provided by these models on the pricing of longevity derivatives has been investigated, for example, in Levantesi and Nigri (2019).
This paper aims to describe the main results of the recent actuarial literature on applying AI methodologies to mortality forecasts to improve both fitting and forecasting of the models traditionally used to stimulate their use in the actuarial prac-tice, with relevant implications in the longevity risk management.Note that all the contributions mentioned here work on aggregate data by country, age, and gender, not on individual features.

METHODS
The class of the generalized age-period cohort (GAPC) models (Villegas et al., 2015) embraces most of the stochastic mortality models proposed in the literature, where the following predictor explains the effects of age a, calendar year t, and cohort c: , where α a is the age-specific parameter giving the average age-specific mortality pattern; ( ) i t κ de- notes the time-varying index for the general mortality and ( ) i a β modifies its effect across ages, so it represents the deviations from the trend of age as κ varies, their product is the age-period factor of the mortality trends; γ t-a is the cohort parameter and ( ) 0 a β modifies its impact across ages (c = t -a is the year of birth), for this ( ) According to the GAPC framework, the Lee-Carter model is described as follows: where the predictor η a,t is the logarithm of the central death rates, m a,t .Throughout the paper, it will refer to the Lee-Carter model as extended by Brouhns et al. (2002), based on the assumption that the observed number of deaths, D a,t , in the population are realizations the random variable, which follows a Poisson process.
It is necessary to introduce in the model two constraints to solve the identifiability questions of the parameters: The time trend ( ) κ is described by an autoregres- sive integrated moving average (ARIMA) process to estimate the future probabilities.An ARIMA(0,1,0) usually provides the best fit to the data.Mortality forecasting literature does not provide a vast space for ML modeling.It is still unfamiliar to both actuaries and demographers.The main contributions are from Deprez et al. (2017), Levantesi and Pizzorusso (2019), Levantesi and Nigri (2019); the common idea behind all these works is to improve the fitting accuracy of canonical models using ML algorithms.In other words, correct the mortality surface produced by standard stochastic mortality models.All of the proposed methods, adopting CART, calibrate an ML estimator used to adjust (and improve) the mortality rates estimated by the original mortality model.Those authors show that mortality modeling can benefit from ML, which better captures patterns that traditional models do not identify.Specifically, Deprez et al. (2017) imply decision trees to overcome the weaknesses of different mortality models, also investigating the cause-of-death mortality.Levantesi and Pizzorusso (2019) use decision trees, random forest, and gradient boosting to calibrate an ML estimator.Finally, Levantesi and Nigri (2019) focus on the random forest algorithm.
All these contributions work on mortality data downloaded from the Human Mortality Database (HMD).At state of the art, there are still no academic contributions in the field of ML working with mortality data of insurance portfolios.
The regression and decision tree algorithm, known as CART, which stands for Classification and Regression Trees, works on the feature space.The algorithm provides a sub-spaces division of the predictor space, partitioning it into a sequence of binary splits (Hastie et al., 2016).The ML literature has rapidly shown relevant evolution, adopting the ensemble methods (e.g., random forest and gradient boosting machine) to improve CART performances aggregating many trees.
In the literature, the ML implementation of mortality models refers to four categorical variables identifying an individual: gender (g), age (a), calendar year (t), and year of birth (c).Therefore, the model assigns each individual the feature ( )  ,. ĵ The mortality improvement reached by the ML algorithm is given by the relative changes of central death rates ,,  Levantesi and Nigri (2019), this approach can also be used to analyze the traditional mortality models' limits.

ĵ
The ML algorithms used in the application provided by the specific literature on mortality modeling are Decision Tree (DT), Random Forest (RF), and Gradient Boosting Machine (GBM) and are summarized as follows.
Let ( ) X τ τ∈ be the partition of Χ, where J is the number of distinct and non-overlapping regions.The DT estimator, given a set of variables x, is defined as: is the indicator function.The regions ( ) found by minimizing the residual sum of squares.
RF aggregates many DTs, obtained by generating bootstrap training samples from the original dataset (Breiman, 2001).This algorithm's main characteristic is to select a random subset of predictors at each split, thus preventing the dominance of strong predictors in the splits of each tree (James et al., 2017).The RF estimator is calculated as where B is the number of bootstrap samples and GBM considers a sequential approach in which each DT uses the information from the previous one to improve the current fit (Friedman, 2001).Given the current fit at each stage i (for i = 1, 2, ..., N), GBM algorithm provides a new model adding an estimator h to the fit: where i hH ∈ is the family of differentiable functions and λ is the multiplier derived through the optimization problem.
According to Levantesi and Pizzorusso's (2019) framework, an ML estimator improves mortality forecasts obtained by implementing the Lee-Carter model.The ML estimator is modeled as follows: ( ) where the parameters ( ) The authors underline that adopting the original mortality model for both fitting and forecast of ML estimator leads to an improvement in projection and makes it easy to understand the effect of such improvements on the model's parameters.
Among the CART methods, the Lee-Carter model might be enhanced through the following model suggested by Levantesi   Despite the widespread skepticism among demographers and actuaries, it is possible to record a steady increase of scientific contributions to mortality modeling through DL.Albeit the proposed literature shows prominent approaches, the applications on global panorama seem too uneven, jeopardizing the practitioner's interpretation and most likely pushing down the DL and NN attractiveness.In light of that, a clear recap and exposition of NN in mortality are required.
The term NN refers to a computational model inspired by the human brain, where multiple processing layers are implied to learn information from data represented by multiple levels of abstraction, solving even the most complex problems.Its shape is based on neurons, links between them, and learning algorithms (backpropagation).NN works as a weighted regression in which each unit suggests "weighted" information thanks to the activated links, and the activation function transforms the weighted sum of input signals into output.Albeit DL embodies a wide variety of networks differing from each other by the architectural structure, the basic scheme of NN is presented, the so-called feedforward structure.
Let n j be the number of neurons in layer j and ϕ i,j a generic activation function.The output of the i-th neuron in layer j, denoted by y i,j , is computed as follows: ( ) The contributions on this topic are from Hainaut (2018), Richman and Wüthrich (2018), and Nigri et al. (2019Nigri et al. ( , 2020)).All of them apply NNs as universal function approximators, aiming to solve different tasks, thus deserving separate discussions.The framework used in Hainaut (2018) aims to replace singular value decomposition (SVD) into the Lee-Carter model.Specifically, the author proposes two stages of estimation: firstly, through the autoencoder, the mortality dataset is reduced in a small number of latent factors.Thus, during the second step, an econometric model is applied to forecast the latent variables.The neural analyzer is adjusted to different countries using as input the centered log-forces of mortality, denoted by: The encoding and decoding functions are calibrated to minimize the sum of squared residuals between observer and generated mortality curves: This approach aims to overcome the parameter estimation related to SVD, which mainly concerns the lack of nonlinear components, which affects the mortality surface estimation.It is worth noting that the forecasting methods remain the same so that the extrapolation over time is related to canonical models with the classical limitation that are pointed out in Nigri et al. (2019), where a DL integration into Lee-Carter model is suggested.They underline the key role of κ t param- eter to depict the future nonlinear mortality behavior.More precisely, Nigri et al. (2019) propose alternative processes applying an RNN with an LSTM architecture to describe the evolution of κ t over time.Indeed, taking into account both long and short-term dynamics, the LSTM can treat the time series noise, reproducing it into the forecasted trend.The authors propose an LSTM model that approximates the function f linking κ t to its delayed values: : ,, , , where ε is a homoscedastic error term.To implement the LSTM algorithm, the dataset is split into the training test, where supervised learning is instructed, and the testing set.
During the training, the network learns the inputoutput relationship; the function that describes such a link can predict the interested variables, in this case, κ t , starting only from the input.In the dataset, the input is a (n × J) matrix of the time lags of κ t , and the output is the (n × 1) vector of its current values, where n ∈ N is the number of instances.The forecasted values of κ t are derived recursively and are calculated using only the pre-  for both the genders.They obtain better results for RF that is more effective than DT and GBM.Both fit and forecasting appear improved.

Deprez
Following the same research line, Levantesi and Nigri (2019) propose to extrapolate the ML estimator on the whole mortality surface using the two-dimensional P-splines and obtain more accurate mortality projections.Moreover, the authors develop a sensitivity analysis of the model to predictors, aiming to investigate if the results are reasonable in a demographic perspective, and a sensitivity analysis on the age interval to analyze if there are some improvements on a reduced dataset.The numerical application is based on the mortality dataset of Australia, France, Italy, Spain, the UK, and the USA with variables of both genders.The authors obtained significant improvements in the mortality projection of the Lee-Carter model by applying the RF algorithm.These results hold for all the analyzed countries.
The innovation through DL models is driven by constant mortality improvement, addressing the need to better understand future mortality dynamics changes.In this scenario, the canonical Lee-Carter model and its evolutions remain the gold standard for comparing future models' performance.Hainaut, 2018 using the data for French, UK, and US mortality rates; the training dataset comprises the years 1946-2000 and the test set by the years 2001-2014.The results show that the NN approach has strongly over perform the Lee-Carter model with and without cohort effects.Since this method acts only as an estimation procedure, the authors state that the empirical evidence supports that a simple random walk might be a plausible method to forecast the latent factors.On the other hand, Nigri et al. (2019) point out the relevant implications of exploiting an extension of the Lee-Carter model, based on an RNN with LSTM architecture to forecast the latent factor κ t .
The authors investigate six countries, showing better accuracy levels of the mortality trend forecast than the classical approach.Finally, Richman and Wüthrich (2018), pursuing a pure NN approach, project multi-population mortality rates exploiting the main advantages of the multi-population approach, i.e., the robustness compared to singlepopulation models, due to common factors such as population health, economics, and technology.The authors use all countries in the Human Mortality Database, considering data from 1950 to 1999 as the training set and data from 1991 to 1999 as the testing set.Therefore, they assess the forecasting performance on the validation set from 2000 onwards.

DISCUSSION
The diffusion of ML techniques in the insurance sector to deal with large available datasets and extract important information from them is increasing competitiveness, bringing to more effective risk management and efficient pricing policies.This work has retraced the advances in recent literature on the application of ML techniques to longevity risk.While it is now a widespread practice in many areas of the insurance business to use modern predictive techniques, in practice, the management of longevity risk is still entrusted to traditional approaches.In this context, the first advantage AI could generate is reducing the information asymmetry between insurer and policyholder: a better understanding and quantification of each policyholder's specific risk can be reached by simultaneously exploiting the information present in national and international datasets and those relating to individual portfolios.This is useful in pricing life products with long-term duration and lifelong guaranteed options embedded in pension contracts.Another AI opportunity is the improvement of longevity risk quantification for health insurance and long term care products thanks to the exploitation of patterns in available medical and socio-economic data since traditional risk models require a very long time to process.AI favors the advances of the microeconomic and structural models, and the risk management is shifting from statistical methods to supervised tree algorithms that automatically select variables of interest and better identify the data connections.Available data could be integrated by those offered by smart sensor technologies for policyholders' health monitoring.

CONCLUSION
Detailed considerations on the regulatory and ethical aspects deriving from the introduction of ML techniques in the insurance sector are beyond the scope of this work, which instead aims to stimulate reflection on the advantages that these techniques implemented in practice could bring to the quantification of longevity risk.However, besides the great potential of ML, there are also some drawbacks.First of all, one issue concerns the forecasts' bias, which can be significant for mortality datasets.As it is well known in data science, the bias may result from how the data was collected: if the algorithm is trained on a non-representative data set of the real distribution of the population, the forecasts could be biased.This can happen if the algorithms used to estimate a specific insurer's portfolio's longevity risk are calibrated on national data.Simultaneously, the problem is partially solved if internal databases built on the historical experience of the individual company are available.On closer inspection, it falls back on the classic question of the possibility of incurring the so-called basic risk, of which not even the traditional models are free.Another widely known limit of algorithms trained on big data is the possibility of a correlation between variables in this large dataset.On the other hand, through ML, the output from internal risk models can be validated and improved continuously.One thing is certain: the introduction of AI requires companies to change the business culture.The crucial point is to develop processes and tools to give stakeholders the possibility to understand how risks can be identified and managed within limits set by the firm's risk culture.With this awareness, it is important to assess whether the results of the application of ML are in line with the results produced by classical models and to find the explanation for any variance.
In light of the above, it can be argued that ML can offer insurance companies and pension fund managers new tools and methods supporting actuaries in classifying longevity risks, offering accurate predictive pricing models, and reducing losses.In addition to the areas in which ML methodologies are already used, the transition from the traditional modeling of longevity risk to an innovative one could represent a challenge for the life insurance sector.

1 x
space X could be enriched with other information such as the income, the marital status, to be or not to be a smoker.The model requires that the number of deaths D x satisfies the age independence condition in x ∈ and follows a Poisson distribution, where m x is the central death rate and E x are the exposures.Let us denote the expected number of deaths of the chosen stochastic mortality model j as j x d and let j x m the central death rate.Following the framework in Deprez et al. (2017), the initial condition of the model is ψ ≡ is equivalent to state that the mortality model completely fits the crude rates.This is an ideal condition as the modthe crude rates in the real world.The aim of the approach used in Deprez et al. (2017), Levantesi and Pizzorusso (2019), and Levantesi and Nigri (2019) is essential to calibrate the parameter ψ x according to an ML algorithm in order to improve the fitting accuracy of the mortality model.The estimator ψ x is found as a solution of a tree classification algorithm used on the ratio between the death observations and the death calculated through specified mortality model: estimator that is solution of equation (3), where j is the mortality mod- el and ML the ML technique.To reach a better fit of the observed data, the central death rate of the mortality model,

ψ
are then obtained by applying an ML algorithm.As shown in estimator on sample b.
the data dimension, m x and mx are respectively the ob- served and the estimated central death rates.While the models accuracy of forecasting is calculated through the root mean squared error, to compare the future mortality rates in an out-of-sample test.
where X(t) is a vector of dimensions x min max nx x = − and α x is the vector of average log-mortality rates according to the Lee-Carter formulation.The basic idea is to determine two functions: an encoding and a decoding function denoted byConcerning the classical Lee-Carter model, the factor β x κ t is substituted with a nonlinear function, and log-mortality forces are calculated through the following relation: et al. (2017) implement the decision trees algorithm to verify the goodness of mortality estimates obtained by implementing the Lee-Carter model and Renshaw-Haberman model.They backtest a mortality model to highlight the model's ability to explain mortality for each age, year of birth, and gender.The case study is based on the mortality data of Switzerland and the following set of variables for both genders.The results show improvements in the accuracy of the Lee-Carter and Renshaw-Haberman models' mortality fitting because there are factors that are not well caught by these models.Levantesi and Pizzorusso (2019) enhance the Lee-Carter, Renshaw-Haberman, and Plat mortality projections through DT, RF, and GBM.These al-gorithms work on the ratio between observed and estimated deaths from a given model.The results show an increase in the accuracy of the projections of all three models.The case study is developed on the Italian mortality dataset with variables