Identifying the key system parameters of the organic Rankine cycle using the principal component analysis based on an experimental database

The organic Rankine cycle (ORC) is a promising technology for medium-and-low temperature heat utilization. However, the mechanism of how system parameters affect output have been investigated very little in the experimental aspect. Experimental investigation on the impact of each system parameter on system performance requires decoupling these system parameters. In this work, a series of experiments are conducted on a 10 kW scale ORC experiment setup. Statistical analysis is performed to identify a key parameter subset based on an experimental database. 6 system parameters, including temperature (Te) and pressure (pe) at the evaporator outlet, temperature (Tc) and pressure (pc) at the condenser inlet, expander shaft efficiency (ηSSE), and working fluid pump efficiency (ηP) are obtained. Combined with the ORC net power output and thermal efficiency, an experimental database of system operation conditions is constructed. Subsequently, the principal component analysis (PCA) of ORC is conducted based on the experimental database. Prediction models are developed using multi-linear regression (MLR), back propagation artificial neural network (BP-ANN), and support vector regression (SVR). Finally, accounting for the prediction performance of models and system parameter intercorrelation behavior, the key parameter subset is determined with the exhaustive feature selection method. The results imply that the key parameter subset is (pe, ηP, pc, ηSSE). Further removing or including more system parameters would reduce the accuracy of prediction models. In addition, the MLR models are slightly less accurate than the more sophisticated BP-ANN and SVR models.


Introduction
Organic Rankine cycle (ORC) is an efficient and affordable technology for the utilization of low-and-medium temperature heat source from the waste heat and renewable energy [1,2]. Because of its advantages of versatility, high efficiency and simple configuration, ORC is considered a promising technology in many applications, such as geothermal [3], solar [4], biomass [5], internal combustion engines waste heat [6][7][8], and industrial waste heat [9,10].
Efforts have been made to improve ORC system performance by optimizing system parameters in both theoretical and experimental approaches. Yang et al. [11] investigated the effect of working fluid characteristic parameters on the ORC thermodynamic performance based on a corresponding state approach. Unamba et al. [12] investigated the influence of heat source temperature, working fluid flow rates and pressure ratios on the ORC overall efficiency based on experiments. Dickes et al. [13] analyzed the influence of fluid charge on the ORC system performance and pointed out that the fluid charge and receiver size are the key parameters. Sarkar [14] evaluated the effect of pinch point design and working fluid mass flow on the subcritical-supercritical ORC system performance. Imran et al. [15] studied the effect of working fluid superheating on ORC system thermal efficiency based on experiments. Branchini et al. [16] performed a parametric investigation of ORC system and found the evaporation pressure was the most important key parameter which could improve system efficiency. Meng et al. [17] investigated the influence of centrifugal pump efficiency on ORC system performance and found that increasing the pump efficiency could improve system performance. Xi et al. [18] conducted an orthogonal experiment to analyze the sensitivity of parameters on ORC system thermal efficiency and proposed that the expander inlet temperature was the dominant factor. On the one hand, there are many system parameters that are related to the ORC system performance. It is inevitable that there is redundant information as many parameters are obtained.
On the other hand, the ORC system performance cannot be determined by only 1 or 2 key parameters. Therefore, to obtain accurate ORC system performance based on as few system parameters as possible, it is necessary to identify the key system parameter subset based on a statistical approach.
Statistical methods can be used to interpret the relationship between the system parameters and outputs. Efforts have been made to identify the key ORC system parameters using statistical method. Bademlioglu et al. [19,20] adopted two statistical methods: Taguchi and ANOVA, to obtain the contribution ratios and importance order of system parameters in ORC system. Kalina et al. [21] presented the grey box and black box regression models to identify the optimal operation condition of ORC. However, for any thermal system, the system parameters are highly coupled, and the system parameters cannot vary independently. Principal component analysis (PCA) is a widely used statistical method that transforms raw multi-dimensional observation data into a set of orthogonal vectors. It can discover the hidden patterns in the variables and decouple these variables by constructing a few principal components that contain as much variables information as possible. Furthermore, The PCA can reduce the dimension of the database. Recently, PCA has received much attention in thermodynamic research [22][23][24].
This paper aims to identify the key parameter subset in the ORC system based on an experimental database. 6 system parameters, including temperature (T e ) and pressure (p e ) at the evaporator outlet, temperature (T c ) and pressure (p c ) at the condenser inlet, expander shaft 3 efficiency (η SSE ), and working fluid pump efficiency (η P ) are selected as variables. Correlations among these ORC system parameters are investigated, and the PCA is performed. The net power output and thermal efficiency are selected as ORC system performance indicators. Regression models are developed using a few machine learning methods. Finally, according to the data correlation and prediction performance of the regression models, the key parameter subset is identified using the exhaustive feature selection method. The key parameter subset can benefit the design and control of the ORC system.

Experiment
A 10 kW scale ORC test system is employed in the present work. Fig. 1 shows the system schematic diagram. The system consists of 3 circulations: conductive oil circulation, working fluid circulation, and cooling water circulation. Fig. 2 shows a photograph of the ORC test system.

Conductive oil circulation
In the experimental setup, the conductive circulation serves as the heat source. It consists of 3 major components: boiler, pump, and heat exchanger. In the boiler, the conductive oil is heated by an electric heater. Then, it flows into the heat exchanger and rejects heat to the working fluid circulation. Finally, it is pumped back into the boiler.

Working fluid circulation
The working fluid circulation consists of 4 components: pump, evaporator, expander, and condenser. Table 1 shows the types and parameters of the components. The working fluid is R123 [25]. It is pumped into the evaporator and heated to superheated vapor. Then, it flows into the expander and drives the machinery. Finally, it cools down in the condenser and enters the working fluid pump. Table 2 shows the main parameters of the sensors used in this circulation. The locations of temperature and pressure sensors are depicted in Fig. 1.

Cooling water circulation
The cooling water circulation serves as the heat sink. It consists of 3 major components: pump, cooling water tower, and condenser. This circulation absorbs heat from the working fluid circulation, and rejects heat to the environment.

Experimental procedure
A series of experiments are conducted on the ORC test system. The aim is to investigate how the system performance is affected by the system parameters including p e , T e , p c , T c , η P and η SSE . The rotation speed of the expander of the ORC system is controlled within the range of 3000 ± 20 rpm, in order to match the 50 Hz electricity grid frequency. The operation conditions are changed by adjusting the conductive oil circulation and the working fluid pump. During the experiment, the pressure and the temperature at the evaporator outlet changed from the initial state (T = 80.40 • C, p = 4.18 bar) to the end state (T = 121.40 • C, p = 10.61 bar). At the same time, all the measurement data are recorded. Table 3 shows the variation ranges of the ORC system experimental data.
A total of 2636 steady experimental data points are collected. It takes about 220 h to obtain these experimental data points. The mass flow rate in the system components is used for determining whether the experiment is at steady-state. When the difference of mass flow rate between the inlet of expander and the inlet of evaporator is less than 50 g/s, the ORC system is considered to be at steady state. The combined uncertainty is, where ΔY represents the uncertainty of target variable Y, and ΔX i   Table 3 Variation ranges of ORC system test data.  represents the uncertainty of measured variable X. Table 4 shows the uncertainties of the net power output and thermal efficiency.

Thermodynamic model
This section describes the thermodynamic framework of the ORC system. Fig. 3 shows the T-s diagram. The ORC system consists of 4 processes: the heat absorption process (5-1) in the evaporator, the expansion process (1-2) in the single screw expander, the condensation process (2)(3)(4) in the condenser, and the compression process (4)(5) in the working fluid pump.
The expander shaft efficiency is, The working fluid pump efficiency is, The net power output is, The thermal efficiency is,

Principal component analysis
PCA is a classic statistical method to analysis datasets that are correlated between the dimensions. It extracts the important information from the variables, and transforms the original variables into a set of new orthogonal variables, i.e., principal components (PCs) [26]. The PCs represent a rank of magnitudes of variable variance on the orthogonal axis. They are ordered so that the first PC accounts for the most variance in the dataset, and that the following PCs account for the most variances on the directions that are orthogonal to the obtained PCs subsequently.
The procedure of PCA is described as follows [27]. First, a new matrix X* is standardized from the raw data matrix X. Then, the correlation coefficient matrix R is calculated based on X*. Then, the eigenvalues (λ 1 ,λ 2 ,…,λ p ) in the order of magnitudes are obtained by solving the characteristic equation |λI-R| = 0 with Jacobi algorithm and the eigenvector is found corresponding to the eigenvalue. Finally, the PCs contribution percentage, the cumulative contribution percentage, and the PC factor loadings are calculated.
In the ORC system, the two pairs of system parameters (p e and T e , p c and T c ) are always strongly correlated. The PCA reduces the dimension of the experimental database. Thus, the "basic structure" of the system parameters can be captured, i.e., the key parameters can be identified.

Regression models
This section introduces the regression models. In a previous work [28], it has been found that the traditional thermodynamic ORC regression model may reduce the accuracy as the irreversible losses, such as pressure drops, heat losses and mechanical losses, are neglected. Multivariable linear regression (MLR) is commonly used regression model with simple structure. Back propagation artificial neural network (BP-ANN) and support vector regression (SVR) are popular methods to develop machine learning regression. In this work, MLR, BP-ANN and SVR models are developed to correlate the experimental data. The flow chart of specific calculation and training processes is shown in Fig. 4. These data are randomly divided into training (75%), validation (15%) and test (15%) sets. The training set is used for training the correlations between input and output variables. The validation set is used for tuning the intermediate parameters of BP-ANN and SVR models during the training process. In the MLR model, no validation set is needed; the training and validation sets are combined (85%) and used for training. The test set is used as new data for testing the prediction performance.
The invariables are the 6 system parameters, i.e., p e , T e , p c , T c , η P , and η SSE . The responses are the net power output and thermal efficiency.

Details of the 3 models are shown below.
(1) MLR: MLR is the straightforward approach that uses a linear combination of the 6 invariables to predict the responses. (2) BP-ANN: BP-ANN is a multi-layer feedforward neural network trained according to the error back propagation algorithm [29]. It has a lot of advantages, such as self-adaption, self-learning, nonlinear mapping, high accuracy, and has been widely used in ORC research [30][31][32]. In this work, the MATLAB neural network toolbox is used. The number of hidden node layer is 1. The number of hidden nodes is 9. The learning rate is 0.01. The training function is Levenberg-Marquardt. (3) SVR: SVR is an important branch of support vector machine, and it minimizes the total deviation of all sample points from the hyperplane [33]. The SVR is a powerful tool that allows one to decide how tolerate one is about regression errors. In this work, the libsvm-3.24 toolbox [34] is used. The grid method is selected.

Exhaustive feature selection
Exhaustive feature selection is an effective technique to find the optimal subset of relevant features. For each number of parameters (1 to 6), the key parameter subset is identified by comparing all the different combinations.

Correlation analysis and PCA
Before the PCA is implemented, the correlations between the system parameters (raw data) are investigated. In this work, the 4 system parameters, p e , T e , p c and T c , are denoted as the working fluid state parameters, and the other 2 system parameters, η P and η SSE , are denoted as the equipment operating efficiency parameters. Table 5 shows the correlation coefficients between the 6 system parameters. The closer the absolute value of the correlation coefficient is to 1, the stronger the correlation between the system parameters is. In contrast, the closer the correlation coefficient is to 0, the weaker the correlation between the system parameters is. The results show that there are strong correlations between the working fluid state parameters. As expected, the correlations between the equipment operating efficiency parameters, and either of them with the working fluid state parameters are not as pronounced. The correlation coefficient between the equipment operating efficiency parameters is − 0.548, and the correlations between the equipment operating efficiency parameters and the working fluid state parameters are even lower, implying that the equipment operating efficiency parameters are quite independent.
In PCA, the 1st principal component (PC) contains the most variance of the data. The 2nd PC contains the most variance in the remaining dimensions, etc. Table 6 shows the PCA factor loadings and eigenvalues. PC1, PC2, PC3, PC4, PC5 and PC6 represents the 1st, 2nd, 3rd, 4th, 5th, and 6th PCs, respectively. Fig. 5 shows the contribution percentage and cumulative contribution percentage of the PCs. The cumulative information contribution rate of PC1 and PC2 is up to 95.21%. According to the PCA, the PC1 factor loadings of the ORC working fluid state parameters are much larger than those of the equipment operating efficiency parameters, while the PC2 factor loadings of the ORC working fluid state parameters are much smaller than those of the equipment operating efficiency parameters. Thus, the PC1 can be viewed as the working fluid state parameters indicator; the PC2 can be viewed as the equipment operating efficiency parameters indicator.

Model evaluation
The determination coefficient (R 2 ) and root mean square error (RMSE) are 2 important indicators to evaluate the performance of regression models. Figs. 6 and 7 show R 2 and RMSE of the MLR, BP-ANN and SVR models. All the net power output models achieve good prediction (the R 2 of each regression model is more than 0.995, and the RMSE of each regression model is less than 0.055). R 2 of the MLR thermal efficiency model (R 2 = 0.9772) is slightly smaller than that of the BP-ANN and SVR models. RMSE of the MLR net power output and thermal efficiency models are larger than that of BP-ANN and SVR models. SVR is the most accurate among the models. Although slightly less accurate, the MLR models are comparable to the more sophisticated models.
The correlation coefficients between each PC and system performance is shown in Table 7. The net power output is highly related to PC1. The correlation coefficients with the other PCs are much smaller. The much smaller correlation coefficient implies that the impact of the system parameters is largely captured by PC1. However, one has to include PC2 in any model because the thermal efficiency requires the extra variance information, especially that of the equipment operating efficiency parameters.

Identification of the key parameter subset
For a real ORC system, many parameters can be obtained from the installed sensors. It is straightforward that information from more than 1 or 2 of these parameters is needed to account for variances on the many dimensions. However, many of these parameters are strongly correlated with each other, as discussed in the previous sections. Furthermore, some of the parameters are not as important as others. Most importantly, a model with many parameters is very likely to be over-trained. Consequently, to obtain an accurate predictive model, it is desired to identify a key parameter subset.
As discussed in Section 4.1, most of the system parameters information is extracted in PC1 and PC2. In this section, MLR models are developed using PC1 and PC2 as invariables. The models are denoted as   PCA-MLR. In this approach, the key parameter subset is identified. In the following context, all the results are of the test set, i.e., predictions on new data. Fig. 8 shows the used system parameters in the PCA-MLR models using 1-6 systems parameters. Fig. 9 shows R 2 of the PCA-MLR models for the net power output and thermal efficiency using different numbers of system parameters. PC1 and PC2 are used as invariables of the PCA-MLR models as mentioned above. In the particular case in which only 1 system parameter is used, only PC1 is used, obviously. The feature selection is exhaustive, i.e., all the combinations among the 6 system parameters are compared upon removal of each system parameter. One should note that early removal does not indicate low importance of the removed parameter. The removal sequence is because of both the importance of the system parameters on system performance and their correlation with other system parameters. As discussed above, T c and T e are removed as the number of variables is reduced from 6 to 4 because they are strongly correlated with p e and p c . With 4 system parameters, both R 2 reaches the maximum. Compared to R 2 of MLR models using all 6 system parameters as invariables, R 2 of the PCA-MLR models is slightly higher. Both R 2 of the PCA-MLR models decrease as the number of the system parameters is reduced from 4 to 1. Fig. 10 shows RMSE of the PCA-MLR models for the net power output and thermal efficiency using different numbers of system parameters. The behavior is similar to that of R 2 . Thus, the key parameter subset can be identified as (p e , η P , p c , η SSE ). Fig. 11 shows that the relative deviation of the PCA-MLR net power output model and the absolute deviation of the PCA-MLR thermal efficiency model. It presents that the relative deviation of MLR model for the net power output is between − 5% and 5%, while the absolute deviation of MLR model for the thermal efficiency is between − 0.15% and 0.15%.

Conclusions
This paper has identified the key parameter subset in the ORC system based on an experimental database obtained from a 10 kW scale ORC system. Correlations between the system parameters are investigated, and decoupled using PCA. Prediction models are developed using MLR, BP-ANN, and SVR. Key parameters are identified based on an exhaustive search. The major conclusions are: (1) The determined key parameter subset consists of 4 system parameters: (p e , η P , p c , η SSE ). The selected key parameter subset includes information from all 4 major components. Further removal of parameters would result in loss of information (and thus reduced accuracy). Including more system parameters would result in over training (and thus reduced accuracy).
(2) The MLR models are slightly less accurate than the more sophisticated BP-ANN and SVR models. Upon replacing the system parameters with the first 2 PCs of the key parameter subset as invariables, the PCA-MLR model is of simple structure and acceptable accuracy. The relative deviation of the net power output is between − 5% and 5%. The absolute deviation of the thermal efficiency is between − 0.15% and 0.15%. (3) The working fluid state parameters present stronger correlation between each other than the equipment operating efficiency parameters. The correlation coefficients of working fluid state parameters are all larger than 0.974, while the correlation coefficients of the equipment operating efficiency parameters with the other system parameters are smaller than 0.6. (4) Increasing the evaporating pressure of the ORC system, reducing the condensing pressure, and improving the working fluid pump efficiency and the expander shaft efficiency can improve ORC system performance.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.