Insulation condition ranking of transformers through principal component analysis and analytic hierarchy process

: Interpretation of oil test data for transformer insulation condition is essential towards justifying asset management practices. Traditionally, an empirical formula (EF) is used by asset managers. This study introduces principal component analysis (PCA) and analytic hierarchy process (AHP) as two alternatives. Through the use of an oil test dataset consisting of 39 in-service UK transmission transformers measured for multiple ageing related parameters, PCA demonstrated its potential in working directly with data to explore parameter relations as well as ranking transformers according to their conditions. AHP on the other hand presented a way to coherently aggregate criteria in a flexible hierarchical setup for identifying the weightages of the oil test parameters before interpretation of measurements. The interpreted conditions based on PCA and AHP, along with a track-record proven EF are similar, particularly for transformers at extreme ends of the insulation condition.


Introduction
Managing a large fleet of ageing transformers is a big challenge for electrical utilities. To optimise capital expenditure while upholding power system reliability, a condition based approach towards asset management is widely adopted in terms of using the knowledge on insulation ageing to facilitate prioritising maintenance or corrective actions [1][2][3][4][5]. This involves interpreting transformer insulation condition from oil test data that contain records of moisture, acidity among other parameters [3,[5][6][7][8].
For that purpose, a health index, formulated by empirical experience or expert judgement is often used. This is done by aggregating the individual scores and weightages predefined for each parameter into a quantitative value to advise asset managers on the transformer insulation condition. Examples can be found in [1,5]. Nonetheless, this health index formulation is dependent on expert knowledge, operational experience and forensic experience involving scrapping data. These might not be readily available to all utilities and could hence hamper formulation of a representative insulation health index.
In the light of the large size of oil test databases and the number of ageing or condition indicators available, research on how to more coherently formulate representative health index scores for a large fleet of in-service transformers would be useful towards facilitating decision making on key actions such as the replacement or refurbishment of transformers. This has culminated in a progressive shift towards the use of intelligent mathematical procedures as reported in [9][10][11][12]. This paper will introduce the application of two techniques which are principal component analysis (PCA) and analytic hierarchy process (AHP) in ranking in-service transformers according to their insulation condition data.

Methodology
The oil test data contain parameters of breakdown voltage (BDV), moisture, acidity, 2-furfural (2-FAL), dielectric dissipation factor (DDF), resistivity, interfacial tension (IFT) and colour. A total of 39 in-service UK transmission transformers with the measurement records of the mentioned parameters in the year of 2012 were selected for this work. The data are found in Table 1. Besides the measurement records of the eight parameters, the in-service age of the transformers is also shown. These transformers, without undergoing any oil treatment, are free breathing, insulated with ordinary Kraft paper and mineral oil.
As in Fig. 1, these data are subjected to PCA and AHP for transformer insulation condition ranking. This forms the major part of the work. A track-record proven empirical formula (EF) used by the utility that has provided the data will also be applied to facilitate discussion among the condition rankings interpreted.

Principal component analysis
PCA is a technique that reduces the dimensions of a dataset with a large number of correlated variables [13,14]. While keeping the original data variation, PCA transforms the original correlated variables into a new set of uncorrelated variables known as principal components (PCs) [13,14]. Generally, n number of original variables will result in n number of PCs. Dimension reduction could be achieved when the first k number of the PCs are selected for representing the original data (where k < n).
Owing to its long history, PCA is widely applied in various fields, from neuroscience, image processing, chemometrics to partial discharge analysis [14][15][16]. In this work, PCA will be applied primarily to capture the essence of a set of oil test data for transformer insulation condition ranking.
The following describes the PCA process used in this paper which is adapted from [13,14,16]. Let X raw be a dataset of m entries each consisting of n original variables (e.g. the oil test data used in this paper have 39 transformers with records of eight parameters). To avoid emphasising on records of a larger scale range, each value is to be mean-centred and scaled based on the mean and standard deviation evaluated for each variable as in (1) [13,14,16]. The resulting matrix, X is shown in (2).
Equation (3) shows Y as the PC dataset with the same m entries corresponding to k PCs after linearly combining X with A. Note that again before k components are chosen, k = n. Essentially, A contains the individual weightages for each of the n original variables where the first column denotes the set of weightages for linearly combining the n original variables into the first PC, second column for the second PC up until the kth column for the kth PC.
The key is to evaluate A which is done by eigenvector decomposition or singular value decomposition (SVD) [13,16]. SVD which is used in this work is known as a more general solution method as it can be applied to not just square matrices [16,17]. As in (4), SVD involves decomposing the matrix X (m × n) into three components, namely U (m × r orthogonal matrix), S (r × r diagonal matrix) and V (n × r orthogonal matrix) [13,16,17]. In other words, they represent a rotation, a stretch and a secondary rotation [16]. Knowing X T X is symmetric and hence diagonalisable, S and V can first be obtained through calculation of the eigenvalue and eigenvector of X T X [17]. From (5), S is simply the square root of the eigenvalue and V the eigenvector of X T X. Subsequently, U is obtained from (6).
These three component matrices, U, S and V are directly related to Y = XA as discussed shortly. Since PCA results in a new set of uncorrelated variables, the covariance matrix of Y, denoted by C Y , is a diagonal n × n matrix (before k components are chosen, i.e. k = n) [ 16,17]. By substituting Y = XA, C Y can be expressed as a function of covariance of X as in (7). Then, (8) is obtained by first substituting (5) into (7) and knowing V is actually A since C Y is diagonal [16,17]. Therefore, the component matrix V evaluated from SVD is actually the A matrix [16,17]. Covariance of Y is simply the component matrix S 2 divided by (m-1) [16,17]. Finally, the Y matrix itself can be simply evaluated through the product of X and A or even the product of the component matrices U and S as seen  in (9) based on the SVD relation in (4) [16,17].
Note that the A matrix obtained directly from SVD would need to have a sign convention enforced which is done by multiplying the sign of the largest coefficient in each column with all the coefficients in that particular column [18,19].

Analytic hierarchy process
AHP is a decision making technique that involves creating a hierarchy to assess a set of alternatives pertaining to a certain aim [20,21]. Due to its easy and coherent approach that caters both qualitative and quantitative evaluation as well as its comprehensive nature of combining insights, intuition and experience with mathematics and data [21][22][23][24][25], this relatively new technique has extensively been applied in wide ranging fields such as defence, aviation, health, IT, manufacturing, banking and more recently in power system [20,22,[24][25][26][27][28]. In this work, AHP will be applied on the oil test parameters to devise a set of weightages for those parameters. Those weightages will then be applied to the measurements of the 39 in-service transformers for insulation condition ranking. Fig. 2 illustrates the concept of AHP. Note that the structure and complexity of the hierarchy is up to user's discretion and could vary for different applications. Generally, the evaluation of a set of p alternatives pertaining to a certain aim (such as ranking oil test parameters for their suitability in representing transformer insulation condition) is done with respect to a set of q criteria, which in turn could be assessed based on a set of r sub-criteria (if required) [20][21][22].
With the hierarchy established, the subsequent step involves pairwise comparison of each pair of elements at each level with reference to the level directly above [20]. For instance, at the highest level, the preference of Criterion 1 over Criterion 2 with reference to the aim is to be assessed. Such a qualitative pairwise comparison requires the use of a scale for translating subjective preference into numbers [20]. Table 2 shows two possible scales fit for this purpose.
The traditional linear scale originally mooted by Saaty [20] is the most commonly used. In spite of its relative ease, this scale is reported to produce unevenly dispersed local weights that could undermine sensitivity when comparing elements that are close to one another which can be overcome through the use of a balanced scale [22,23]. The work described in this paper will hence adopt the balanced scale.
With the scale, comparison matrices can be used to encapsulate all the pairwise comparisons performed for each pair of elements at each level. A comparison matrix, C has a form as shown in (10).
The diagonal values are always unity as they mean simply comparing an element with itself. The off-diagonal elements are separated into lower and higher triangular halves whereby the values in one half are simply reciprocals of the other half since the elements to be compared are the same [20]. Each comparison matrix is subsequently converted to local weightages. For example, a q × q comparison matrix for the first level in Fig. 2 is converted to q number of local weightages, one for each of the criteria. This conversion from a comparison matrix to local weightages is done via either the traditional evaluation of matrix eigenvector or geometric mean evaluation (logarithmic least squares method) [20,22].
Due to the potential occurrence of rank reversal issues with the eigenvector evaluation method [22], geometric mean will be used in this work. As the comparison matrix C is a square matrix, each weightage is evaluated by calculating the zth root of the product of all elements in a row or in a column, before normalising it [22]. This is shown as. (11) With weightages of the elements at each level evaluated, the hierarchy in Fig. 2 will be filled with local weightages, w. Subsequently, the global weightages, W, of each element in the hierarchy can then be evaluated through additive aggregation as in (12) [20,22]. In essence, the global weightage of an element, W is the sum of the product between the global weightage of a reference on an upper level, W upper level and the local weightage of that element on the current level with respect to that certain reference, w @upper level .
Through this AHP evaluative process, a set of weightages can be assigned to the oil test parameters which subsequently can be used on the individual transformer oil test measurements for insulation condition ranking.

Results
With understanding on the two procedures proposed for transformer insulation condition ranking, their application on the oil test data (39 transformers with eight parameters) will be shown in this section.  If an element i compared with an element j has one of the above non-zero numbers assigned to it, then the reciprocal value is assigned when comparing j with i  [30,31]. Both test statistics as seen in Fig. 3 suggest suitability for applying PCA on the original dataset. After performing PCA, eight PCs were obtained which individually explain different percentages of the variance in the original dataset. Both individual percentage and cumulative percentage of the variance explained (VE) by each PC are also shown in Fig. 3. A common way of interpreting the resulting PCs is through a bi-plot. For visualisation purposes, dimension reduction involving just three PCs will be used here, illustrated by the three dimensional bi-plot in Fig. 4. There will be an inevitable loss of representativeness as the use of three PCs for this particular dataset in this work accounted to about 75% of the original data variance. Nonetheless, this three-dimensional bi-plot allows a quick and direct interpretation of the relative insulation condition of the individual transformers based on just the three PCs as well as how these three PCs can relate back to the original variables [14].
This bi-plot was done by first plotting the values of the three PCs of the eight original variables (values obtained from the first three columns of the A matrix described in Section 2.1). What were plotted next are the values of the three PCs of the 39 individual units (first three columns of the Y matrix), which were normalised by the maximum in the three PCs for the 39 units and scaled to the maximum length out of the eight original variable representations in the three dimensional space. Note that sign convention was enforced for both original variables and individual units representations [14].
From Fig. 4, PC1 represents colour, IFT, resistivity, acidity and DDF well in terms of their PC1 magnitude, to be followed by 2-FAL, moisture and BDV. Considering the signs in PC1, moisture, DDF, colour, acidity and 2-FAL behave in an opposite manner to BDV, IFT and resistivity. These observations are useful towards not just indicating the behaviour of the oil test parameters with age, but also a rough estimation of how well they represent ageing. PC2 and PC3 can be interpreted similarly but PC1 accounts for most of the original data variance.
More importantly, the bi-plot allows users to quickly identify transformers that are the best and the worst in terms of their insulation conditions. Simply, the origin indicates the average condition, whereas the transformers further away from the origin represent either a better condition or a worse condition. For instance, T38 is deemed to be the best condition transformer, whereas T20 is the worst as it has high positive values of PC1, PC2 and PC3 which can also be interpreted by projecting orthogonally this T20 point with respect to each of the lines representing the original variables.
Apart from graphically interpreting the bi-plot, a value arbitrarily called as PCA rank was calculated as in (13) by aggregating the normalised and scaled versions of the three PCs with respect to their VE as illustrated in Fig. 3. The incorporation of the VE information resembles a form of weightage allocation. Note instead of just using three PCs (following from the three dimensional bi-plot visualisation); this PCA rank calculation could have incorporated all eight PCs. Nevertheless, the difference between the two is small and the PCA rank based on three PCs will be used here in this work. Table 3 shows decreasing order of PCA ranks with high values indicating a poorer condition. It can be seen that, for instance, the worst three transformers are T20, T30 and T4, whereas the best three transformers are T38, T9 and T13.

AHP implementation
AHP was first used in parameter ranking to determine the individual weightages of each oil test parameter. Similar to a classical EF, the weightages were then applied to the oil test data for condition ranking. Fig. 5 shows the AHP setup for parameter ranking. The criteria selections and the pairwise comparison evaluations were facilitated by the experience and judgement gained from previous studies on analysing large oil test databases [6,8,32,33] as well as literature [34][35][36][37].
In this work, four criteria were chosen to aid parameter ranking. Measurement reliability (MR) relates to the confidence on the parameter measurements, for instance the measurement principles, inherent stability and history. The second and third criteria are more straightforward in terms of how well the parameters represent oil and paper condition, respectively. The final criterion is correlation with age that indicates how sensitive the parameters are to transformer age. The pairwise comparisons among the four criteria were qualitatively determined as seen in Table 4. The four criteria are represented by their initials for simplicity.
Oil condition representation (OCR) and paper condition representation (PCR) were judged to be slightly more important than MR as the aim is about condition ranking of insulation consisting of oil and paper. Similarly, correlation with transformer age (CTA) which reflects how well the parameters represent ageing was deemed to be more important than MR. These inputs into comparing OCR, PCR and CTA based on MR are reflected by the second to fourth entries in the first column; which are also reflected by the second to fourth entries in the first row. Besides that, the comparison matrix was completed through acknowledging that paper health is more influential towards transformer usability than the oil health as the oil is more easily cleaned or replaced. Finally, PCR is regarded as important as CTA, which was reflected by the fourth entry of the fourth column.
With the elements in the comparison matrix recorded, the resulting weightages pertaining to criteria ranking (CR) were computed as shown in the last row.
Moving down the hierarchy in Fig. 5 to evaluate the preference of the parameters with respect to the four criteria, Tables 5-7 show the pairwise comparison values and resulting weightages. Note the eight parameters are represented by alphabets for simplicity, seen from Fig. 5. Any information gathered from previous studies and literature will aid increasing or reducing the importance of the parameters with respect to a particular criterion.
Pertaining to MR as in Table 5, the logical thinking that guided the selection of the pairwise comparison values are discussed in the following. Acidity and 2-FAL were regarded as the most reliable considering their wide usage among electrical utilities if compared with other useful parameters such as DDF, resistivity and IFT which are perhaps relatively less adopted [6,8]. Colour was thought to be the least reliable knowing how subjective colour determination is [6]. Moisture and BDV were deemed more reliable than colour in general but still less preferable considering potential moisture ingress, temperature and seasonal influence of moisture as well as the indirect effects these would have on BDV [8,32].
As for OCR, the values in Table 6 were based on knowing in general that acidity is linked directly with oil degradation products and can reflect well the crucial late ageing stages [6]. Among the other  parameters, as moisture is not just produced by oil oxidation, but also paper hydrolysis, it would be less preferred in general, indirectly affecting BDV as well [32]. Most importantly, 2-FAL should always be the least preferred as it is a paper ageing product [36].
In the context of PCR, 2-FAL now has to be always the most preferred parameter. Apart from that rather obvious thought process into facilitating values determination in Table 7, it is also noted that even though most would reside in paper, paper ageing also produces moisture and acidic products that can be found in oil [36].
Unlike the previous criteria involving qualitative pairwise comparison, the final criterion of CTA was evaluated quantitatively or statistically via population trend analyses in the form of Spearman's correlation coefficient with transformer in-service age, S. The population trend analyses were performed on around 4500 in-service UK transformers operating at multiple voltage levels corresponding to about 65,000 oil test entries [6,8,33]. These coefficients as adapted from [6] are shown in Table 8. The weightages were then calculated through normalisation.
Note that the pairwise comparison values seen in Tables 5-7 demonstrated the possibility of incorporating subjective judgements whereas the values in Table 8 portrayed the incorporation of objective studies based on data. Such capabilities highlight the merits of AHP. In addition, the pairwise comparison values can be easily updated when more information is available. Moreover, the structure of the AHP formulated for the decision making process can also be easily updated.   By aggregating the weightages from the upper level Table 4 that dealt with the pairwise comparison among the four criteria, to all the other lower level tables examining the preference of the eight original oil test parameters pertaining to each of the criteria (from Tables 5-8), the global weightages or the global ranks of the parameters were calculated as can be found in Table 9. This set of weightages was then applied to the original oil test data (39 units with eight parameters) for insulation condition ranking.
The raw records in the original data were normalised according to the increasing or decreasing nature of the parameters with respect to insulation condition change. For instance, acidity measurements were divided by the maximum acidity within the dataset. With that, the 39 transformer condition rankings based on AHP are shown in decreasing order in Table 10. High values indicate a poorer condition whereas low values represent a better condition. In terms of AHP, the worst three transformers are T20, T4 and T30, whereas the best three transformers are T38, T9 and T8.

Discussion
This section discusses the potentials of PCA and AHP application in comparison with an EF used by the utility providing the data. This EF has been fine-tuned and validated through years of operational and forensic experience involving failed and scrapped transformers by the data provider; and has been used by the data provider in asset management of their in-service transformer fleets [5]. Table 11 shows the EF ranking in decreasing order with high values implying a poorer condition.
With EF rank now, comparison can be made not just between PCA rank and AHP rank, but also with respect to the EF rank. Fig. 6 provides the comparison with transformers listed in a decreasing severity of insulation condition interpreted through PCA, EF and AHP. The shaded entries represent tied ranks as the method of condition ranking based on EF as seen in Table 11 would result in transformers that are assigned the same condition score.
Analysing Fig. 6 with consideration of the in-service age information initially given in Table 1, age itself does not necessarily reflect the condition or the status of the transformer insulation condition. For instance, T20 which has been in-service for 46 years is adjudged to have the worst condition, whereas T38 which is deemed to have the best condition has been in-service for 45 years. In other words, the good condition transformers are not necessarily the ones that are younger. This is because the condition of a transformer does not just depend on age, but also other aspects such as the loading history, design and so forth.
This paper intends to use the EF ranking as the reference to discuss the performance of the alternative techniques. As seen in Fig. 6, transformers at the two extreme ends of the list are similar in terms of the condition rankings evaluated based on the three techniques. As for the transformers in between the two extremes, the difference in rankings does exist. Nevertheless the difference that appears on the list could be in essence due to not just tied ranks by EF (where the ranks can be interchanged), but also small numerical differences in the scores evaluated from both PCA and AHP. Hence, from this perspective, groups of transformers having similar conditions are actually similar for all three techniques.
As mentioned previously, an EF that provides a representative view of insulation condition ranking requires expertise, operational and forensic experience that might not be available to all utilities interested in knowing the condition of their in-service transformer fleet based on oil test data. Here in this work, PCA and AHP were demonstrated to offer reliable and promising alternatives as they could offer a similar output to that based on a track-record proven EF   T20  220  T25  110  T27  80  T1  140  T29  110  T6  70  T4  140  T36  110  T7  70  T26  140  T10  100  T12  70  T30  140  T15  90  T18  70  T33  140  T16  90  T2  60  T3  130  T21  90  T5  60  T23  130  T28  90  T22  60  T35  130  T32  90  T24  60  T39  130  T34  90  T8  50  T14  120  T37  90  T9  40  T19  120  T11  80  T13  40  T31  120  T17  80  T38  used by the data provider. These two methods could also offer a point of reference or validation for utilities that have developed their own EFs. Through capturing the essence of the data presented, PCA has the merits of allowing an exploration of the relations among the original parameters and the relative rankings of the individual transformers. It is data-centred and hence can be applied directly to data of potentially greater number of dimensions. This is useful to utilities without an in-house developed EF or if quick information on the condition of a large transformer fleet is required. Care however needs to be exercised for potential users on their own databases as statistical tests such as KMO and Bartlett tests would need to be performed prior to PCA. Moreover, in terms of visualising the resulting PCs via a bi-plot, the VE by each of the PCs needs to be carefully considered.
On the other hand, AHP could offer a systematic way to formalise the splitting of a problem through incorporating both subjective and objective senses by establishing a hierarchy that evaluates not just qualitative (subjective preference in pairwise comparisons), but also quantitative aspects that require analyses done previously on transformer populations. In addition, AHP is easily understood, with high coherence and flexibility that allow further development once additional knowledge or information is available. The pairwise comparison matrices or even the number of levels or criteria in the hierarchy could easily be adjusted according to the requirements of asset management strategies.
Considering the promises of PCA and AHP in interpreting data for condition ranking, they can be applied to other transformer aspects. Potential usage includes individual or combined interpretation of dissolved gas analysis data, frequency response analysis data as well as data concerning the conditions of tap changer, bushing or cooling system. Besides that, PCA and AHP could also be applied to facilitate reliability evaluation of other assets such as generators, switchgears, cables and overhead lines. These interpreted conditions can then aid subsequent asset maintenance and management decisions.

Conclusion
Interpretation of transformer insulation condition facilitates transformer asset management. Apart from the traditional use of an EF, PCA and AHP were demonstrated to be a useful supplement or alternative to a track-record proven EF through interpretation of a dataset of 39 in-service UK transmission transformers measured for multiple oil test parameters. All three techniques yielded groups of transformers with similar insulation conditions, particularly for transformers at extreme ends of condition interpreted. PCA offers a direct, data-centred approach towards data mining for transformer condition ranking, allowing exploration of original parameter relations and relative rankings of individual units. AHP on the other hand provides a structured framework incorporating both quantitative and objective approaches towards identifying the weightages of the oil test parameters before interpretation of oil test records for condition ranking.