A Novel Model Validation Method Based on Area Metric Disagreement between Accelerated Storage Distributions and Natural Storage Data

: It has been a challenge to quantify the credibility of the accelerated storage model until now. This paper introduces a quantitative measurement named the CMADT (Creditability Metric of Accelerated Degradation Test), which quantiﬁes the credibility of the accelerated aging model based on available data. The relevant criterion data are obtained from the natural storage test. CMADT is a credibility metric obtained by measuring the difference in the metric area between the probability distribution of the accelerated storage model and its criterion data. In addition, the accelerated aging model might include multiple parameters resulting in several single-parameter CMADTs. This paper proposes a method that integrates several single-parameter CMADT metrics into a single metric to assess the overall credibility of the accelerated storage model. Moreover, CMADT is universal for different scales of sample data. The cases addressed in this paper show that CMADT helps designers and decision-makers judge the credibility of the result obtained by the accelerated storage model intuitively and makes it easier to compare various products horizontally.


Introduction
Weapons such as missiles and torpedoes have the characteristics of long-term storage and one-time use, and the majority of their life cycle involves storage. During the long-term storage period, due to factors such as corrosion, aging, and material surface and interface reactions, the material properties and physical parameters of products will gradually change, which ultimately leads to product failure due to failure to meet functional performance requirements.
During the design phase, we need to access the weapon storage lifetime, including components, raw materials, and parts, to ensure that the elements used in the equipment can meet the storage lifetime criterion. At the stereotype stage, the storage lifetime of the overall system needs to be verified, and after delivery, there is still a need for periodic sampling and testing of products in service and prediction of remaining life.
The accelerated storage test (AST) is one of the critical techniques used in the aforementioned storage lifetime assessment. AST is a testing technique that enhances a particular stress quantity to obtain the crucial parameters of the product's performance degradation during storage while the failure mechanism remains constant [1][2][3]. It is practically useful in engineering due to its high speed and efficiency. Since AST obtains degenerating performance data by rapidly increasing the corresponding stress, it is possible that the failure mechanism will change due to excessive stress [4]. That is, the failure mechanism maintaining invariability is a precondition for AST [5,6]. Once the failure mechanism changes, the assessment results will definitely deviate from the real state of the product. Many factors may affect the credibility of AST results, such as test equipment errors, artificial errors, test ambient fluctuation, and sample dispersion. Even though all mentioned factors could be controlled precisely, there still exists disagreement between the parameters that are inferred from the AST model under normal stress, such as the storage lifetime and model credibility, and the system's real state. In addition, there is a continuing debate as to whether the AST model is correct and reflects the acceleration process of various products.
In recent decades, extensive research has been carried out both domestically and abroad to investigate the alteration of the failure mechanism that occurs in the accelerated testing process. In general, these studies can be divided into two categories: the first involves using the perspective of failure physics to judge the consistency of the failure mechanism under different stresses. This aim is mainly achieved through a comprehensive analysis of the chemical and microscopic structures, as well as destructive physics, to judge whether samples under different stresses are consistent in terms of microscopic appearance, element distribution, material properties, and other factors [7][8][9][10]. It has a clear physical concept and can only qualitatively judge whether the failure mechanism has changed. However, it is difficult to provide quantitatively consistent test results. Spearman's rank correlation coefficient [11] and grey theory [12,13] are used to identify the consistency of the failure mechanism based on the shape of the degradation path. From the perspective of the pseudo-life distribution, the literature [14,15] has evaluated the reliability and predicted the lifetime, and it obeys a log-normal or Weibull distribution using the F-statistic and Bartlett statistic. The variation in the shape of the parameters in the Arrhenius model also provides a way to explore the consistency of the failure mechanism [16]. Cai et al. [17] proposed a change-point model for the coefficients of variation to fit the abrupt change behavior of the failure mechanisms with a nonparametric empirical likelihood approach, which was used in the lifetime data of the metal oxide semiconductor transistors in the power distribution system of the Chinese Tiangong space station. Zhai et al. [18] proposed a method for consistency testing of ADT (accelerated degradation test) failure mechanisms based on the activation energy invariance method and the likelihood ratio test, accounting for the degradation dispersion caused by the manufacturing technology.
The aforementioned technique ensures the credibility of the AST result to some extent, even though there are still a few defects. For example, the method based on failure physics only qualitatively determines the change in the failure mechanism. Based on the experimental data, the boundary consistency method and the failure mechanism consistency discrimination only determine the consistency of the failure mechanism instead of assessing the result incredibility caused by factors such as measurement error and accelerated model applicability. For the AST result, the most reliable criterion is to test its consistency with the corresponding natural storage test data. However, how can the degree of consistency be judged and to what extent? There is still a lack of good evaluation metrics. This paper proposes a new method for evaluating the credibility of accelerated storage test data based on the idea of an area metric with small samples of natural storage test data as the benchmark. The remainder of this paper is organized as follows: Section 2 introduces the theory of the probability distribution area metric. Section 3 defines a credibility metric called CMADT, which is extrapolated from the aforementioned theory and is applied to assess the credibility of the AST results. An engineering use case is conducted to demonstrate the validation of the assessment metric in Section 4, and Section 5 offers conclusions with a summary of the main findings.

CMADT Credibility Metric for the Accelerated Aging Test
The main stresses loaded in the accelerated storage test are high-temperature single stress and temperature-humidity double-stress. There are four types of accelerated storage tests: constant stress, step stress, step-down stress, and sequential stress, according to the different methods of stress loading. Regardless of which stress and stress loading methods are used, we can derive the degradation model of key performance parameters under normal stress (e.g., temperature 25 • C) through performance degradation modeling and accelerated model solving [19][20][21][22].
In engineering practice, there are often some natural storage test data in addition to accelerated storage test data. For example, the natural storage test is carried out during the initial sample stage of product development, and the natural storage test data can be obtained during the final evaluation. Natural storage data can also be obtained during the service stage after the equipment is delivered every year. Generally, natural storage data have a higher confidence level, but there are often two problems. First, the storage period is short; for example, it is necessary to assess the reliability of product storage for 20 years, while natural storage data often only have a few years of storage time data. Second, the sample size is often small, and the natural storage data are not sufficient to provide high-confidence assessment results. For the above reasons, storage life assessment is often provided by accelerated storage tests in engineering practice, while natural storage test data are mainly used to verify the correctness of accelerated storage tests.

Area Metric for a Single Parameter
The area metric was first proposed in 2008 by American scholars Ferson and Oberkampf [23,24], and developed by LI [25], JI [26] and ZHANG [27], et al.; it is a confirmation metric based on the probability distribution distance, which is mainly used in the field of modeling simulations. Figure 1 shows that by calculating the area between the simulation model response and the empirical cumulative distribution function of the experimental observations (shaded part in Figure 1), the accuracy of the simulation model is quantified and evaluated.
The main stresses loaded in the accelerated storage test are high-temperature single stress and temperature-humidity double-stress. There are four types of accelerated storage tests: constant stress, step stress, step-down stress, and sequential stress, according to the different methods of stress loading. Regardless of which stress and stress loading methods are used, we can derive the degradation model of key performance parameters under normal stress (e.g., temperature 25 °C) through performance degradation modeling and accelerated model solving [19][20][21][22] .
In engineering practice, there are often some natural storage test data in addition to accelerated storage test data. For example, the natural storage test is carried out during the initial sample stage of product development, and the natural storage test data can be obtained during the final evaluation. Natural storage data can also be obtained during the service stage after the equipment is delivered every year. Generally, natural storage data have a higher confidence level, but there are often two problems. First, the storage period is short; for example, it is necessary to assess the reliability of product storage for 20 years, while natural storage data often only have a few years of storage time data. Second, the sample size is often small, and the natural storage data are not sufficient to provide highconfidence assessment results. For the above reasons, storage life assessment is often provided by accelerated storage tests in engineering practice, while natural storage test data are mainly used to verify the correctness of accelerated storage tests.

Area Metric for a Single Parameter
The area metric was first proposed in 2008 by American scholars Ferson and Oberkampf [23,24], and developed by LI [25], JI [26] and ZHANG [27], et al.; it is a confirmation metric based on the probability distribution distance, which is mainly used in the field of modeling simulations. Figure 1 shows that by calculating the area between the simulation model response and the empirical cumulative distribution function of the experimental observations (shaded part in Figure 1), the accuracy of the simulation model is quantified and evaluated. Suppose that the accumulated distribution function for the equivalent data of a product's key performance parameters, which are obtained from the accelerated storage test at time , is F (x), and the accumulated distribution function of the r samples tested in the natural storage test at time is F (x), then, the area metric confirmed by the model can be borrowed. F (x) can be seen as the accumulated distribution function of the simulation model response, and F (x) is seen as the accumulated distribution function of the test observations. The area metric of the accelerated storage test momentary confidence evaluation is defined as Suppose that the accumulated distribution function for the equivalent data of a product's key performance parameters, which are obtained from the accelerated storage test at time t i , is F a t i (x), and the accumulated distribution function of the r samples tested in the natural storage test at time t i is F b t i (x), then, the area metric confirmed by the model can be borrowed. F a t i (x) can be seen as the accumulated distribution function of the simulation model response, and F b t i (x) is seen as the accumulated distribution function of the test observations. The area metric of the accelerated storage test t i momentary confidence evaluation is defined as From Equation (1), it can be seen that the area metric is smaller when the probability distribution of the equivalent data is closer to that of the benchmark data, and vice versa. Therefore, the area metric can be used to assess the confidence level of the accelerated storage test.

Dimensionless Measurement of Area Metrics
The metrics defined in Equation (1) are content-based, and it is not clear how large the gap is between multiple parameters of different scales for the same product or between different products; therefore, the evaluation results cannot be compared. It is also not clear how the large gap indicates "excellent" quality and how a small gap indicates "poor" quality.
To obtain a unified evaluation criterion, the metrics of Equation (1) in this paper are dimensionless, and their mathematical definition at time t i is is the value of the same scale as the area metric A F a t i (x), F b t i (x) and is used to characterize the dispersion of the accelerated storage test data, as shown in Figure 2.

Its expression is
where µ i and σ i are the mean and standard deviation of the accelerated storage test degradation data at time t i , respectively.
From Equation (1), it can be seen that the area metric is smaller when the probability distribution of the equivalent data is closer to that of the benchmark data, and vice versa. Therefore, the area metric can be used to assess the confidence level of the accelerated storage test.

Dimensionless Measurement of Area Metrics
The metrics defined in Equation (1) are content-based, and it is not clear how large the gap is between multiple parameters of different scales for the same product or between different products; therefore, the evaluation results cannot be compared. It is also not clear how the large gap indicates "excellent" quality and how a small gap indicates "poor" quality.
To obtain a unified evaluation criterion, the metrics of Equation (1) in this paper are dimensionless, and their mathematical definition at time is ρ is called the Credibility Metric of Accelerated Degradation Test (CMADT), where ( ( ), ( )) is the value of the same scale as the area metric ( ( ), ( )) and is used to characterize the dispersion of the accelerated storage test data, as shown in Figure  2. Its expression is where and are the mean and standard deviation of the accelerated storage test degradation data at time , respectively.
From Equation (2), it is clear that ρ is dimensionless, which is convenient for subsequent accelerated storage tests of multiple key performance parameters in terms of providing a unified plausibility measure.  From Equation (2), it is clear that ρ i is dimensionless, which is convenient for subsequent accelerated storage tests of multiple key performance parameters in terms of providing a unified plausibility measure.

Probability Distributions of Key Performance Parameters Based on Natural Storage Tests with Small Samples
The sample size of the natural storage test is relatively small compared with that of the accelerated aging test. Otherwise, the degradation model can be obtained directly from the empirical data of the natural storage test.
Assume that the sample size of the natural storage test is r and the corresponding test time is t = {t 1 , t 2 , · · · , t L }, where L is the number of tests. A set of key parameter degradation data is . Considering the small sample, the model using the probability distribution easily introduces strong subjective assumptions that affect the accuracy of the assessment. This paper constructs the upper and lower bounds of the p-box of natural storage data based on the belief function and plausibility function of D-S evidence theory, and the processing of this method does not make subjective assumptions about the distribution type, which can effectively retain the statistical characteristics of the original information.
The sequence ∼ b i can form the set B I i consisting of r interval numbers.
The distance from the mean value b i to each interval number in B I i is calculated, and the basic probability distribution of B I j (j = 1, 2 · · · , r) is obtained. The trust function and likelihood function are constructed to obtain the CDF (Cumulative Distribution Function) of the p-box.
) and E = [e 1 , e 2 ](e 1 ≤ e 2 ) be two interval numbers and set as the distance between the interval numbers D and E. When p = 2, note that Normalize L 2 B I j , b i to obtain the distance vector from the interval number B I j to b i In turn, the similarity between the interval number B I j and b i is defined as ξ j expresses the extent to which the distribution interval of the individual test data is similar to the expected value b i as a basis for assigning a confidence probability to B I j , i.e., BPA.
Although the underlying probability distribution of B I j has been obtained, there is no evidence to show what distribution the parameter P obeys in the interval B I j .
Therefore, the p-box of the parameter P is constructed with the upper bound as the likelihood function and the lower bound as the confidence function, such that any possible probability distribution of x falls in this envelope.

CMADT under the p-Box
The performance degradation model for the parameter P under normal stress, which is derived from the accelerated storage test data, yields a set of time-section data a i = {a i1 , a i2 , · · · , a iN } at moment t i , where N is related to the sample size involved in the accelerated storage test, and the performance degradation modeling method is used. If the modeling method of the performance degradation track is used, Generally, the sample size of the equivalent data obtained from the evaluation of accelerated storage test data is relatively large. For example, in a high-temperature accelerated storage test with three stress levels and five samples per stress level, if the performance degradation trajectory method is used, we can obtain 5 3 = 125 degradation curves under normal stress. Then, the sample size of the cross-sectional data at moment t j is 125. Therefore, the probability distribution F a t i (x) of the time-section data A can be better described by hypothesis testing using commonly used distribution types such as normal and log-normal distributions.
As p-box is used to express the small sample natural storage data in this paper, by using Equation (2) to calculate CMADT, the obtained result will be an interval number ρ i = ρ i , ρ i with the following upper and lower bounds:

Single Key Parameter of the Overall Confidence Evaluation
The previous two sections discussed the credibility evaluation metrics of the equivalent data derived from accelerated storage tests at a single point in time. In general, benchmark data often exist for multiple time points of the test data, so an integrated overall credibility metric that combines the credibility of multiple time points needs to be investigated.
To ensure the normalized characteristics of the overall credibility index, this paper performs probability statistics on the CMADT indices of n test points and calculates the credibility confidence lower limits at confidence level γ as the overall credibility index of individual key performance parameters. According to Equation (17), the CMADT value ρ = [ρ L , ρ U ] can be obtained for all n test points, i.e., ρ L = ρ 1 , ρ 2 , · · · , ρ n ρ U = (ρ 1 , ρ 2 , · · · , ρ n ) (15) Let the confidence level of the credibility assessment be γ = 1 − α and α be the significance level. For ρ L , if the series can pass a normality test (e.g., a K-S test), the confidence intervals for the n CMADT values at confidence level γ can be counted according to the normal distribution as follows: where Z α can be obtained by looking up the normal distribution table, and for the more widely used significance level α used in practice, the values of Z α are provided in Table 1. The calculation process is similar for ρ U . The calculation process is not repeated, and finally, ρ − U and ρ + U are obtained. To be conservative, the credibility evaluation of this accelerated storage test is used as the upper confidence limit, i.e., the overall credibility of the single key performance parameter (CMADT of single parameter, CSP) evaluation at confidence level γ is The index obtained by Equation (21) is dimensionless, which is convenient for the horizontal comparison of multiple key performance parameters related to the same product or the credibility of accelerated storage test data for different products.
If the data series P cannot pass the normality test, the kernel density estimation (KDE) method [28] can be used to calculate the lower confidence limit of credibility at confidence level γ. The kernel density estimation is a nonparametric estimation method that can be used for distribution parameter estimation when the distribution is nonnormal and nonstandard.
For a set of data {s(1), s(2), . . . , s(N)}, the probability density function is estimated aŝ where h is the window width and k(x) is the kernel function that satisfies In this paper, we choose the most classical Gaussian kernel function.
Mathematics 2023, 11, 2511 8 of 18 For the data series ρ = [ρ L , ρ U ], after obtaining the KDE estimate, the confidence interval under the confidence level γ can be obtained as In this case, the single key performance parameter CSP evaluation at confidence level

Overall Credibility Evaluation of Multiple Key Parameters
The above discussion is for a case in which the key parameters of the product are single parameters. However, in practice, multiparameter degradation is also common [29][30][31][32]. Let the product have m key performance parameters, which are denoted as P 1 , P 2 , . . . , P m , and the confidence of the accelerated storage test for m parameters is ρ (1) , ρ (2) , . . . , ρ (m) according to the method described in Section 3.3. To obtain the confidence of the whole accelerated storage test, the weights of the m parameters need to be calculated, and the total confidence index is obtained by the weighted average method.
The dynamic time warping (DTW) distance [33,34] is a time series similarity measure with better performance that is suitable for the application scenario of different key performance parameters over time in this paper. Therefore, in this paper, the dynamic time warping distance is used to measure the similarity between individual response quantities and then to determine the contribution (i.e., weight) of each key performance parameter in the calculation of the overall confidence.
From the above definition, it can be seen that it calculates the distance between two time series by finding the minimum path of the distance between time series. The DTW distances of two time series X and Y can be calculated recursively directly using Equation (28).
The correlation calculation of the product's m key parameters based on the DTW distance is provided below.
Step 1: Note that the time series obtained during the degradation trials of the two key parameters P i and P j are y i and y j (i, j = 1, 2, · · · , m), respectively, and the DTW distances of y i and y j (i, j = 1, 2, · · · , m) are calculated according to the above method. The results are expressed as d i,j .
Step 2: Repeat Step 1 to obtain the DTW distance between two of all m key parameters, denoted as Step 3: Normalize Equation (26) Step 4: The rows of the DTW distance matrix d are summed to obtain the weight coefficient of the jth key parameter as results are expressed as , .
Step 2: Repeat Step 1 to obtain the DTW distance between two of all m key p ters, denoted as 1,2 Step 3: Normalize Equation (26) , Step 4: The rows of the DTW distance matrix d are summed to obtain the coefficient of the th key parameter as , , 1,2, ⋯ , Step 5: Let the credibility of m key parameters be , and use the weighted method to obtain a uniform credibility measure for the entire product accelerated test, that is Step 5: Let the credibility of m key parameters beη, and use the weighted average method to obtain a uniform credibility measure for the entire product accelerated storage test, that isη = m ∑ j=1 (28).
The correlation calculation of the product's m key parameter tance is provided below.
Step 1: Note that the time series obtained during the degrada parameters and are and , 1,2, ⋯ , , respectiv tances of and , 1,2, ⋯ , are calculated according to results are expressed as , .
Step 2: Repeat Step 1 to obtain the DTW distance between t ters, denoted as 1,2 Step 3: Normalize Equation (26) , Step 4: The rows of the DTW distance matrix d are summ coefficient of the th key parameter as , , 1,2, ⋯ , Step 5: Let the credibility of m key parameters be , and u method to obtain a uniform credibility measure for the entire pro test, that is

Expert Systems for Credibility Assessment
To facilitate designers or decision-makers to have a more intuitive judgment about the credibility of accelerated storage test results, this paper establishes the credibility evaluation level of the accelerated storage test shown in Table 2 and divides the evaluation results into four levels: "excellent," "good," "medium," and "poor." In Table 2, The confidence level of the accelerated storage test can be determined by calculating the similarity of the intervals ϕ and Ω. i = 1, 2, 3, 4 correspond to "excellent", "good", "medium", and "poor", respectively. The similarity is calculated by using the Euclidean distance of Equation (13), which is defined in Section 3.1; the four similarity variables ξ i (i = 1, 2, 3, 4) are obtained, and the confidence level of the key performance parameters is

Use Case
A quartz accelerometer is a typical inertial device that is widely used in the mathematics field. The quartz plus meter experiences the problem of accuracy drift during long-term storage, which affects its reliability in use. With the aim of evaluating the storage life of quartz plus meters, a high-temperature accelerated storage test was conducted. The specimens were divided into three groups, T 1 , T 2 , · · ·, T 10 denote the test cycles, each test cycle was 80 h, and accelerated storage tests were carried out at 60 • C, 72 • C, and 85 • C for approximately 800 h, with sample sizes of three, five, and five, respectively. The key performance parameters of the quartz accelerometer were the amount of output voltage degradation at position 0 • ; the amount of output voltage degradation at position 180 • ; the output voltage deviation of the centrifugal test, which were noted as P 1 , P 2 , and P 3 , respectively; and the failure threshold D f = 1.6 mv. The performance degradation data for the above three parameters are shown in Tables 3-5. Table 3. The amount of output voltage degradation for the whole quartz plus meter's 0 • position.  Table 4. The amount of output voltage degradation for the whole quartz plus the meter's 180 • position.      The degradation curves of the quartz accelerometer are drawn from the degradation data of three key performance parameters, namely, the amount of output voltage degradation at position 0 • , the amount of output voltage degradation at position 180 • , and the output voltage deviation from the centrifugal test at different stress levels.

Test Cycles and Measured Values
Furthermore, the degradation curves of the three key performance parameters under normal stress levels are obtained, as shown in Figure 4.    Additionally, a batch (8) of quartz accelerometers manufactured in 2010 was tested at the factory and tested once a year during the period 2016 to 2021. The test data were expressed as a degraded quantity (i.e., the test value minus the initial value), and the results are shown in Tables 6-8. An accelerated storage test evaluation of the quartz accelerometer was carried out, and the three key performance parameters for 6~11 years of equivalent storage obeyed a normal distribution. The parameters are shown in Table 9.  The probability distribution of the key performance parameters related to the quartz accelerometer during natural storage is calculated. For P 1 , according to Equations (4)-(12), the p-box for its storage from 6 to 11 years can be constructed, as shown in Figure 5. The probability distribution of the key performance parameters related to the quartz accelerometer during natural storage is calculated. For P1, according to Equations (4)-(12), the p-box for its storage from 6 to 11 years can be constructed, as shown in Figure 5. Similarly, the p-box of the performance degradation data for P2 and P3 at each storage time can also be obtained, which is not listed here due to space limitations.
The confidence indices of individual key performance parameters at each natural storage moment are also calculated. First, the plausibility index of P1 at t = 6a is calculated. Similarly, the p-box of the performance degradation data for P 2 and P 3 at each storage time can also be obtained, which is not listed here due to space limitations.
The confidence indices of individual key performance parameters at each natural storage moment are also calculated. First, the plausibility index of P 1 at t = 6a is calculated. According to the probability distribution of p-box and the equivalent storage 6a of P 1 under natural storage at t = 6a (as shown in Figure 6), from Equations (1) and (2), it is obtained that  Similarly, the plausibility metrics can be calculated for the five-time 00 positions P1 from t = 7a to t = 11a. The procedure for calculating the plausibility metrics for the two key performance parameters, P2 and P3, is similar. Table 10 shows the calculation results of the credibility metrics for the three key performance parameters of the quartz plus table. The overall confidence evaluation index CSP for a single key performance parameter is calculated. According to the K-S test, the parameter P1 in Table 10 follows a normal distribution. Otherwise, P2 and P3 do not follow a normal distribution, so kernel density estimates are used to calculate their statistical properties. Using the confidence level γ = 0.8, = 1 − = 0.2, / = 1.2816, according to Equations (15) and (16), it is obtained that  These expressions lead to the upper and lower bounds for the normalized plausibility metric CMADT as Similarly, the plausibility metrics can be calculated for the five-time 00 positions P 1 from t = 7a to t = 11a. The procedure for calculating the plausibility metrics for the two key performance parameters, P 2 and P 3 , is similar. Table 10 shows the calculation results of the credibility metrics for the three key performance parameters of the quartz plus table. Table 10. Calculation results of the credibility metrics.
The overall confidence evaluation index CSP for a single key performance parameter is calculated. According to the K-S test, the parameter P 1 in Table 10 follows a normal distribution. Otherwise, P 2 and P 3 do not follow a normal distribution, so kernel density estimates are used to calculate their statistical properties. Using the confidence level γ = 0.8, α = 1 − γ = 0.2, Z 1−α/2 = 1.2816, according to Equations (15) and (16), it is obtained that In turn, the evaluation result η = [η L , η U ] is obtained as η L = µ ρ,L − σ ρ,L z 1−α/2 = 0.1752 η U = µ ρ,U − σ ρ,U z 1−α/2 = 0.1865 P 2 and P 3 use kernel density estimation, and their evaluation results η = [η L , η U ] are From Equations (24) (26)-(28), the weight coefficient of the jth key parameter is obtained as Step 1: Note that the time series obtained during the degradation trials of the two parameters and are and , 1,2, ⋯ , , respectively, and the DTW tances of and , 1,2, ⋯ , are calculated according to the above method. results are expressed as , .
Step 2: Repeat Step 1 to obtain the DTW distance between two of all m key para ters, denoted as Step 4: The rows of the DTW distance matrix d are summed to obtain the wei coefficient of the th key parameter as , , 1,2, ⋯ , Step 5: Let the credibility of m key parameters be , and use the weighted aver method to obtain a uniform credibility measure for the entire product accelerated stor test, that is ̂ 1 = 0.2277; Step 1: Note that the time series obtained during the degradation tria parameters and are and , 1,2, ⋯ , , respectively, an tances of and , 1,2, ⋯ , are calculated according to the abo results are expressed as , .
Step 2: Repeat Step 1 to obtain the DTW distance between two of a ters, denoted as 3 = 0.5570 From Equation (25), the weighted average method is used to obtain a uniform measure of confidence for the entire product accelerated storage test aŝ  Table 2, the confidence level of the accelerated storage test is "good".

Conclusions
In many engineering projects related to AST, although many efforts have been expended and various mathematical methods have been used to strive for the accuracy of the evaluation results each time, users still question whether the results obtained under accelerated stress can truly reflect the life of the product. As a result, the results of AST are often questioned in engineering practice, but there seems to be little discussion in academia.
To evaluate the credibility of ASTs, this paper adopts the idea of the area metric to construct the area metric of ASTs and uses natural storage test data as the benchmark. Therefore, a normalized and dimensionless reliability metric CMADT for ASTs is proposed. The percentage from "100%" to "0%" represents model reliability from best to worst. Based on the concept of performance importance, multiple single-point CMADTs are combined into one metric that can reflect the reliability of the AST results for the overall product.
The normalized, dimensionless index CMADT proposed in this paper is of great significance. It meets the challenge that conventional methods cannot quantitatively assess AST results. The CMADT not only allows designers and decision-makers to judge the credibility of AST intuitively, but also allows horizontal comparison of AST results for different products. The quantitative evaluation of CMADT is transformed into expressions that are suitable for human thought in Table 2, i.e., "excellent," "good," "moderate," and "poor," which can help senior decision-makers make better judgments. In addition, CMADT is applicable to different situations, such as those with large samples, small samples, and very small samples, and it has good generality. In addition, due to the lack of similar research results, it is difficult to compare the results with those of existing research methods. If there are any inaccuracies in the viewpoints of this article, we hope readers can point them out and explore them.

Data Availability Statement:
The data presented in this study are available within this paper.

Conflicts of Interest:
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.