A Differential Game of Ecological Compensation Criterion for Transboundary Pollution Abatement under Learning by Doing

+is paper studies a stochastic differential game of transboundary pollution abatement between two kinds of ecological compensation and the abatement policy, in which the learning by doing is taken into account. Emission and pollution abatement between upstream and downstream region in the same basin is a Stackelberg game, and the downstream regions provide economic compensation for pollution abatement in the upstream region. We discuss the feedback Nash equilibrium strategies of proportional compensation and investment compensation, and it is found that an appropriate ecological compensation ratio can improve the investment level of pollution abatement in the two regions by accumulating experience in the process of learning by doing. In the long term, the investment compensation mechanism is an effective transboundary pollution abatement measure that can continuously reduce the water pollution stock in the upstream and downstream.


Introduction
Environmental degradation and its relationship with economic activities have been a subject of intense debate in the academic community and among policy makers. One of the most important environmental problems is water pollution. Pollution in one region can be transported across hundreds and even thousands of kilometers according to Li [1]. Hence, we call it "transboundary pollution". e natural and transregional mobility of the water flow make the regional division and management of water resources very complicated, which leads to the increasingly serious and deteriorating problem of transboundary water pollution in the river basin. Dam construction and random discharge of harmful pollutants happen frequently, and regional contradictions are becoming more and more serious, even triggering tremendous panic in the whole society. Environmental protection of river basin transboundary water pollution has become the focus and difficulty of local government departments.
Differential games provide an effective tool to study pollution control problems and to analyze the interactions between the participants' strategic behaviors and dynamic evolution of pollution. Some researchers have paid their attention to the transboundary industrial pollution problems in recent years. For example, Yeung [2] first derived the time consistent solutions in a cooperative differential game and studied the pollution management in a stochastic differential game framework. In paper [3], a cooperative stochastic differential game of transboundary industrial pollution is presented, and a payment distribution mechanism is derived to maintain the subgame consistency. Additionally, there are several published studies of transboundary pollution problems from other views, such as renewable resource, clean technologies, harmonization of international and domestic law, abatement cost, and R&D spillovers (for instance, [4][5][6][7]). Papers [1,3,8], taking emission permits trading into account, study the problem of transboundary pollution. It has been shown in this literature that coordination of countries' emission strategies leads to a lower total level of pollution and to higher total welfare than when countries use noncooperative emission strategies.
is diffusion of pollutants is common. In each region, those who suffer from pollution wish that the polluter in neighboring regions would either reduce polluting or compensate for the damage, and there is generally a game between the sufferers and the polluter around pollution abatement and benefit compensation [9]. Ecological compensation is an important mechanism in environmental economics and can be used to promote and coordinate harmonious development between humans and nature, and it is an effective way to reduce water environment pollution and limit the discharge of pollutants. In recent years, in watershed studies of ecological compensation, researchers have calculated standard values of ecological compensation that account for water, land, waterways, and society and have specifically considered aspects such as water volume, water ecology, and regional economy [10]. Evaluation of the ecological service function [11] and the willingness-to-pay approach [12,13] have been used in most of these studies. A river basin ecological compensation model reflects multiple factors. A compensation model that only includes a single aspect will fail to meet the fundamental requirements of river basin ecological compensation. is means that the traditional method of focusing solely on water quality should be abandoned, and objective factors that reflect regional heterogeneity should be included in the methods for ecological compensation standards in river basins [14,15]. Eco-compensation criterion is considered as one of the appropriate instruments for internalizing the spillover of public goods, which encourages the establishment of a longitudinal compensation relation between polluted downstream and upstream regions through capital supply, industrial transfer, water rights, and carbon trading [16][17][18]. Heitzig et al. [19] believe that, through reasonable ecological and economic compensation methods, pollution abatement losses in some areas of the basin's water environment can be compensated, and failed environmental compensation cooperation can be further corrected.
Articles using differential game methods to study ecological compensation mechanisms are relatively scarce [20,21], especially combining ecological compensation standards accumulated by transboundary pollution abatement technologies. e interaction between ecological, social, and economic systems, as well as pollution abatement between administrative regions, from upstream to downstream, has been considered to ensure the foundation of ecological compensation for the water environment in the basin. is work studies the transboundary water pollution control problem between two regions, characterizing and contrasting feedback Nash strategies under two kinds of ecological compensation mechanisms. e main purpose is to characterize the parameter spaces in which governments or organizations in two neighboring regions can formulate a consistent ecological compensation mechanism and abatement policy under the influence of regional impacts. Besides, we attempt to explore a differential game model of the interregional ecological compensation mechanism covering the upstream and downstream regions in a continuous time. At the same time, the accumulation of pollution investment technology-learning by doing-is taken into account in this work. e experience will be accumulated continuously in the process of learning by doing, which means that the abatement cost will decrease with the improvement of abatement technology. e rest of this paper is organized as follows. In Section 2, we will establish our basic dynamic general equilibrium model between two kinds of ecological compensation and pollution abatement. In addition, the optimal abatement levels and optimal pollution stock paths under proportional compensation and investment compensation are presented in Section 3 and Section 4, respectively. Some discussions are provided with several numerical examples in Section 5. Finally, Section 6 concludes the paper.

The Basic Model
ere are two adjacent regions (n � 1, 2) in a river basin, which we call upstream region 1 and downstream region 2, in our transboundary pollution model. It is assumed that both regions discharge organic pollutants into the river basin. For region n(n � 1, 2), production always leads to a quantity of by-products, namely, emissions E n (t)(n � 1, 2). We assume that Q n (t) is the industrial output of upstream and downstream region, which indicates that, at time t, the industrial production of region 1 and region 2 is Q 1 (t) and Q 2 (t). e instantaneous emissions of pollutants from industrial output are E n (t)(n � 1, 2); that is, at time t, the pollution emissions of region 1 and region 2 are E 1 (t) and E 2 (t), respectively. It is assumed that pollution emissions in various regions are positively related to industrial production, and the instantaneous linear production function can be expressed as Q n (t) � Q n E n (t) . (1) According to literature [22][23][24], it is assumed that the regional industrial income function is R n (Q n (t)), which can be expressed by the following quadratic concave functional form in terms of emissions: Environmental pollution damage cost is a linear function of pollution stock P n (t), following [23], the cost function is D n (t) � D n P n (t)(D n ≥ 0). Among them, D n indicates the degree of environmental damage per unit pollution stock to region n. e difference between the cooperative solution and the noncooperative solution depends on the spillover effects of benefits and costs, which are likely asymmetric due to geographic and economic differences between the regions. e upstream region imposes a negative unidirectional water burden upon the downstream region by preventing the latter from reaching an unconstrained optimal water quality. Compensation through side payments is a viable way to create incentive for cooperation. Even where resource flows occur unidirectionally, precluding mutual control over negative externalities, there is the possibility of Pareto optimal or "win-win" solutions through side payments.

2
Discrete Dynamics in Nature and Society According to the payment principle of the ecological compensation mechanism, the downstream polluted regions provide certain economic compensation for the investment of the upstream pollution control. is compensation is used to promote pollution control in the upstream regions, thereby reducing the transfer of pollutants to the downstream basins and improving the quality of the water environment. It is assumed that there are two types of compensation: (i) Proportional compensation means to compensate for a certain proportion of investment in pollution abatement. We assume that the proportion of ecological compensation in the downstream region to the upstream region is B(t), and the proportion is determined by the downstream region. (ii) Investment compensation means that the downstream region invests in the upstream region to reduce pollution so as to ensure that fewer pollutants are transferred to the downstream region. e investment intensity function of pollution abatement is I n (t)(n � 1, 2). It is known that pollution abatement can be realized only when technique and labor are invested. erefore, we should face the abatement cost which could decrease the net revenue. Following [22], we assume that the abatement cost can be described by the following quadratic form: where μ n (n � 1, 2) represents positive constants. It measures the difference between the two regions' ability in mastering the abatement technology. Equation (3) means that the marginal cost is increasing with respect to the level of pollution abatement. By means of [25,26], the experience of applying pollution abatement technology G n (t) is measured by the cumulative abatement from time 0 to t; that is, We define G n · (t) > 0, where G n0 (n � 1, 2) denotes the initial experience level of applying pollution abatement technology. Similar to the above, g n (n � 1, 2) is a positive parameter and it represents the differences between the two regions' ability in accumulating experience. According to the learning-by-doing theory, the amount of cumulative experience will lead to a decline in the unit cost.
As we all know, with the change of time and temperature, the water in nature has a certain self-purification ability. Without the loss of generality, it is assumed that the basin has the purification rate of water pollution, namely, the coefficient of self-purification ability of water θ n (n � 1, 2), θ 1 , θ 2 ≥ 0. e pollution in the upstream of the basin mainly comes from the discharge of local polluting industries, while the pollution in the downstream comes from the transfer of pollution from the upstream of the basin, besides the discharge of polluting enterprises in the downstream region. e amount of pollution transferred from upstream to downstream is dP 1 (t), where d is assumed to be the transfer coefficient and 0 ≤ d ≤ 1. e number of initial emission permits in region 1 is E 10 and in region 2 is E 20 . It is assumed that the emission trading market is a fully competitive market and the price of emission trading right is constant at w. If the amount of emission exceeds the initial allocation, the emission right can be purchased in the permit market. On the contrary, if there is a surplus of the emission right, it can be reserved for the next year or sold in the emission permit trading market.
Given the above assumptions, we can get the concrete expression of the income function W n (t) of the two regions:

Proportional Compensation
e amount of compensation provided by the downstream region in the same basin will affect the investment enthusiasm of the upstream region in pollution control and the amount of pollutant discharge in the region. In turn, the amount of pollutant discharge in the upstream region will affect the amount of pollutant discharge in the downstream region, which constitutes a dynamic game relationship between the two sides. Considering the decision-making in continuous time, this constitutes a dynamic differential game relationship.
Under this model, both regions aim to maximize the net present value of their own long-term income. e pollution discharge in the upstream region will affect the income of the downstream region by affecting the pollution stock in the same water basin. e decision-making problem of independent discharge from the two regions constitutes a differential game problem with E 1 (t), E 2 (t), I 1 (t) as control variables and P 1 (t), P 2 (t) as state variables, aiming at maximizing the net present value of their respective income. e stocks of pollution in the upstream and downstream regions of the basin are P 1 (t) and P 2 (t) at time t. Also, the pollution stock in the two basins can be expressed by the following two differential equations: where θ 1 P 1 (t) and θ 2 P 2 (t) represent the stocks of selfpurification pollution. e investment intensity of pollution abatement in upstream region is I 1 (t). dP 1 (t) is the amount of pollution transferred from upstream to downstream. e current goal of region 1 is to maximize the expected present flow of instantaneous net revenue in terms of the Discrete Dynamics in Nature and Society emission path and the abatement level. Hence, the objective functional and the constraint conditions of region 1 can be given as follows: As above, the current goal of downstream region 2 is to maximize the expected present flow of instantaneous net revenue in terms of the emission path and the abatement level. Hence, the objective functional and the constraint conditions of region 2 can be given as follows: In a transboundary pollution control problem, the neighboring regions can be seen as players and they aim at maximizing the joint or their own net present profits under noncooperative games. Under the model of proportional compensation mechanism, the problem is a Stackelberg differential game that determines the amount of pollution emission and the compensation ratio from the downstream of the basin and determines the amount of pollution emission and abatement in the upstream region. Each player tries his best to maximize his own net revenue by choosing the optimal emission path and the optimal abatement level under a noncooperative game. In order to obtain the optimality conditions for the optimal control problems, we use Pontryagin's maximum principle.
e current-value Hamiltonian function of (7) is e current-value Hamiltonian function of (9) is where λ n (n � 1, 2, 3) are the dynamic adjoint variables associated with the state equation about P n (t). Here, the dual variables λ 1 , λ 2 , and λ 3 , also called shadow prices or common state variables, are Lagrange multipliers, which are the derivatives of the two players' value functions, i.e., revenues, with respect to the pollution stock P n (t).
To maximize (11) and (12), the first-order conditions of the current Hamiltonian function are the following: Along with the current value costate equations: 4 Discrete Dynamics in Nature and Society λ 1B e initial state variable values and cross-sectional conditions are as follows: P 1 (0) � P 10 , P 2 (0) � P 20 , P n (T) ≥ 0, Solving (13), (14), and (15), we have us, we can obtain the following results from (16), (17), and (18): Proposition 1. e conditions given above are necessary. Next, we use the sufficiency theorem (O. L. Mangasarian's theorem) to prove that the necessary conditions are also sufficient.
e functions W(·) and _ P(t) are differentiable in this problem and all functions about (P, E, I) are concave. In addition, the function _ P(t) is a linear function about P, E, I, so, in the optimal solution, the positive and negative values of λ(t) can be ignored (no sign restriction is required). erefore, we know from Mangasarian's theorem that, for the optimal control problem in this paper, the necessary condition of the Pontryagin maximum principle is also a sufficient condition for the global maximization of J(·).
For the proof of the theorem validity, see Appendix.
Converting (22) into (12), we get the present-value Hamiltonian function of downstream region 2: To maximize (26), the first-order conditions of the current Hamiltonian function are the following: Solving (27), we get the optimal investment compensation ratio function of downstream region 2 to upstream region 1 as follows: Numerically, solving an ordinary differential equation system means computing a sequence of points which are close to the graph of the real solution. e numerical solution algorithm is based on the numerical integration of the system using the Runge-Kutta method. In numerical analysis, the Runge-Kutta methods are an important family of implicit and explicit iterative methods for the approximation of solutions of ordinary differential equations. Our aim in this section is to study the numerical solution of the differential equation (22) to determine an optimal control strategy, and its time evolution is discussed. e parameters of (22) which are used in the numerical solution are presented in Table1 [27][28][29].
However, it is not difficult to find from (22) and (28) that abatement levels I 1 (t) in region 1 and eco-compensation ratio B(t) are interconnected dynamic systems. In order to Discrete Dynamics in Nature and Society obtain a numerical solution of abatement levels, we assume that the eco-compensation ratio B(t) is 0.2, 0.4, 0.6, 0.7, 0.8, and 0.9, respectively. e optimal numerical solution of abatement levels under different B(t) is obtained using the Runge-Kutta algorithm and is displayed in Figure 1.
From Figure 1, we find that when the proportion of ecological compensation continues to increase, that is, when downstream region 2 gives more and more economic compensation to upstream region 1, the abatement level in upstream region 1 will continue to increase. When the compensation ratio exceeds a certain value, about 0.6 in our simulation, the abatement level will be drastically reduced. In other words, a suitable ecological compensation ratio will increase the enthusiasm of pollution control investment in the upstream region. However, excessive economic compensation from the downstream regions will also greatly reduce this enthusiasm, thereby increasing the reliance on such economic compensation.
Next, we simulated the system equation (6) of the pollution stock in the upstream and downstream and obtained the optimal numerical solutions P 1 and P 2 of the pollution stock in two regions. e results are shown in Figure 2.
It can be seen from Figure 2 that, with the passage of time, under the dual effects of pollution abatement investment in region 1 and economic compensation from downstream region 2, the pollution stock in the upstream will gradually decrease as the investment of pollution abatement increases. In the short term, the pollution stock in the downstream region will increase sharply because of the transfer of pollutants from the upstream region. However, in the long run, with the increasing investment in pollution abatement in the upstream region, the pollution stock transfer will decrease relatively, which has a great mitigation effect on the pollution stock in the downstream region.
In order to evaluate the impact of parameter change on the prediction results of the model, we performed sensitivity analysis by changing the parameter g n ; meanwhile, other parameters were fixed to study the effect of the change of system parameters values on objective function value. e analysis results are shown in Figures 3 and 4. From both figures, it can be concluded that the pollution abatement level is the most sensitive to the change of the parameter g n , and pollution stock is the least sensitive to the change of the parameter g n .

Investment Compensation
In the investment compensation model, the downstream regions in the basin invest in pollution abatement in the upstream to reduce the degree of water pollution in the upstream region and the amount of pollution transfer downstream. In the long run, the cost of investing in pollution control in the downstream region is relatively lower, which is conducive to the long-term benefits of water environment improvement in the downstream region. e compensation model of investment in different places can better exert the effect of interregional cooperation and integrate various forms of capital to solve the dilemma of pollution abatement in river basins.
Different from the proportional compensation model, the assumption of the pollution abatement cost learning-bydoing model, the pollution stock function in the upstream and downstream will change accordingly. e investment cost of water pollution abatement in downstream region 2 and the investment cost of water pollution control in upstream region 1 can be expressed as follows: where I 22 (t) indicates the investment intensity of downstream region 2 in local area, and I 21 (t) indicates the investment intensity of downstream in upstream region 1. Correspondingly, the learning-by-doing function has also changed due to different investment intensity:  6 Discrete Dynamics in Nature and Society In this model, the stocks of pollution in the upstream and downstream regions of the basin are P 1 (t) and P 2 (t) at time t, and the pollution stock in the two basins can be expressed by the following two differential equations: Discrete Dynamics in Nature and Society where θ 1 P 1 (t) and θ 2 P 2 (t) represent the stocks of selfpurification pollution. e investment intensity of pollution abatement in upstream region is I 11 (t). dP 1 (t) is the amount of pollution transferred from upstream to downstream.
e current goal of region 1 is to maximize the expected present flow of instantaneous net revenue in terms of the emission path and the abatement level. Hence, the objective functional and the constraint conditions of region 1 can be given as follows: As above, the current goal of downstream region 2 is to maximize the expected present flow of instantaneous net revenue in terms of the emission path and the abatement level. Hence, the objective functional and the constraint conditions of region 2 can be given as follows: e current-value Hamiltonian function of (33) is (37) e current-value Hamiltonian function of (35) is where λ n (n � 1, 2, 3) are the dynamic adjoint variables associated with the state equation about P n (t). Here, the dual variables λ 1 , λ 2 , and λ 3 , also called shadow prices or common state variables, are Lagrange multipliers, which are the derivatives of the two players' value functions, i.e., revenues, with respect to the pollution stock P n (t).
To maximize (37) and (38), the first-order conditions of the current Hamiltonian function are the following: Solving (39)-(43), we have Proposition 2. e conditions given above are necessary. Next, we use the sufficiency theorem (O. L. Mangasarian's theorem) to prove that the necessary conditions are also sufficient. e function W(·) and _ P(t) are differentiable in this problem and all functions about (P, E, I) are concave. In addition, the function _ P(t) is a linear function about P, E, I, so, in the optimal solution, the positive and negative values of λ(t) can be ignored (no sign restriction is required). erefore, we know from Mangasarian's theorem that, for the optimal control problem in this paper, the necessary condition of the Pontryagin maximum principle is also a sufficient condition for the global maximization of J(·).
For the proof of the theorem validity, see the appendix. From (49), (50), and (51), we can find that abatement levels I 11 (t), I 21 (t), and I 22 (t) are interconnected dynamic systems. In order to obtain a numerical solution of the abatement levels, we use the Runge-Kutta algorithm to solve them. e numerical solutions are plotted in Figure 5, in which the parameters are from Table 1. In the initial stage, the investment level of pollution abatement in the downstream region is higher than that in the upstream region, so as to achieve the overall goal of water pollution control. Due to the long-term effect of investment compensation mechanism, with the increase of investment compensation level in the downstream region, the quality of water environment will be improved in a certain time. Because of this, the investment pressure of downstream pollution control will be reduced, and the investment level of pollution abatement will tend to grow slowly. is can be clearly seen from the I 22 (t) curve in Figure 5.
Next, we simulated the system equation (32) of the pollution stock in the upstream and downstream and obtained the optimal numerical solutions P 1 and P 2 of the pollution stock in two regions. e results are shown in Figure 6.
From Figure 6, we can see that, under the dual effects of pollution abatement investment in region 1 and economic compensation from downstream region 2, the pollution stock in the upstream will gradually decrease as the investment of pollution abatement increases. In the short term, the pollution stock in the downstream region will increase sharply because of the transfer of pollutants from the upstream region. However, in the long run, with the increasing investment in pollution abatement in the upstream region, the pollution stock transfer will decrease relatively, which has a great mitigation effect on the pollution stock in the downstream region.
In order to evaluate the impact of parameter change on the prediction results of the model, we performed sensitivity analysis by changing the parameter g n ; meanwhile, other parameters were fixed to study the effect of the change of system parameters values on objective function value. e analysis results are shown in Figure 7.

Discussion
e biggest difference between the proportional compensation mechanism and the investment compensation mechanism is that the proportional compensation directly compensates the pollution abatement behavior in the upstream region in the form of funds. erefore, this kind of compensation has the characteristics of timely compensation, high capital availability rate, and rapid effect on pollution abatement enthusiasm and can produce better environmental treatment effects in a short period of time.
is can be seen from the path of the pollution stock change in Figure 8, where the dotted line indicates the trace of the pollution stock under the proportional compensation mechanism. Discrete Dynamics in Nature and Society However, the investment compensation mechanism needs a higher investment cost of environmental governance and longer time to improve the mechanism. In the short term, the effect of investment compensation mechanism on water pollution control in river basins is relatively poor, but, in the long term, the investment in pollution abatement in the upstream region can play a strong sustained role and can continuously reduce the pollution level in the waters for a longer period of time. e full line track in Figure 8 reflects this process. erefore, the investment compensation mechanism is a sustainable investment mechanism for water transboundary pollution abatement.

Conclusion
is paper discusses the dynamic optimal strategies in the transboundary pollution game under two kinds of ecological compensation mechanism and the abatement policy, in which the learning by doing is taken into account. e experience will be accumulated continuously in the process of learning by doing, which means that the abatement cost will decrease with the improvement of abatement technology. By solving dynamic equations and numerical simulation, we find that all the results show that the two regions will promote the abatement levels and reduce the pollution stocks when the learning by doing is considered. At the same time, we also found the following: (i) With the continuous improvement of the proportion of ecological compensation, the abatement level in upstream region 1 will continue to increase. When the compensation ratio exceeds a certain value, about 0.6 in our simulation, the abatement level will be drastically reduced. However, under the investment compensation mechanism, in the initial stage, the investment level of pollution abatement in the downstream region is higher than that in the upstream region. Due to the long-term effect of investment compensation mechanism, with the increase of investment compensation level in the downstream region, the quality of water environment will be improved in a certain time. (ii) On the other hand, whether in proportional compensation or in investment compensation, the pollution stock in the upstream will gradually decrease as the investment of pollution abatement increases. In the short term, the pollution stock in the downstream region will increase sharply because of the transfer of pollutants from the upstream region. However, in the long run, the pollution stock transfer will decrease relatively, which has a great mitigation effect on the pollution stock in the downstream region. However, the decreasing process and speed of pollution stocks are different in both cases. (iii) Generally speaking, a suitable ecological compensation ratio will increase the enthusiasm of pollution control investment in the upstream region. However, excessive economic compensation from the downstream regions will also greatly reduce this enthusiasm, thereby increasing the reliance on such economic compensation. (iv) In the long term, the investment in pollution abatement in the upstream region can play a strong sustained role and can continuously reduce the pollution level in the waters for a longer period of time. erefore, for the environmental regulation department, the investment compensation mechanism is a sustainable one for water transboundary pollution abatement.
H � F(t, y, u) + λf(t, y, u), where F(·) represents the function W(·) in the paper, f(·) represents the common state function _ P(t), y represents P(t), and u(t) represents the variables E(t) and I(t).
e optimal control path u * (t) and the corresponding y * (t) and λ * (t) paths must satisfy the principle of maximum value, so zH zu � F u t, y * , u * + λ * f u t, y * , u * � 0. is paper assumes that the problem has a vertical termination line, with initial and cross-sectional conditions, respectively: Functions F and f about (y, u) are concave functions. Hence, for two differences in the domain of definition, (t, y * , u * ) and (t, y, u), we have F(t, y, u) − F t, y * , u * ≤ F y t, y * , u * y − y * + F u t, y * , u * u − u * , f(t, y, u) − f t, y * , u * � f y t, y * , u * y − y * + f u t, y * , u * u − u * . (A.8) By integrating the sides of (A.7) and (A.8) on [0, T], we can get J − J * ≤ T 0 F y t, y * , u * y − y * + F u t, y * , u * u − u * dt � T 0 − λ * · y − y * + λ * f y t, y * , u * y − y * − λ * f u t, y * , u * u − u * dt. From this, it is proved that J * is the global maximum. is ends the proof.

Data Availability
e variables data used to support the findings of this study are included within the article ( Table 1).

Conflicts of Interest
e authors declare that they have no conflicts of interest regarding the publication of this paper.