Mathematical Modeling of Public Opinions: Parameter Estimation, Sensitivity Analysis, and Model Uncertainty Using an Agree-Disagree Opinion Model

In this paper, we present a mathematical model that describes agree-disagree opinions during polls. We first present the different compartments of the model. Then, using the next-generation matrix method, we derive thresholds of the stability of equilibria. We consider two sets of data from the Reuters polling system regarding the approval rating of the U.S presidential in two terms. These two weekly polls data track the percentage of Americans who approve and disapprove of the way the President manages his work. To validate the reality of the underlying model, we use nonlinear least-squares regression to fit the model to actual data. In the first poll, we consider only 31 weeks to estimate the parameters of the model, and then, we compare the rest of the data with the outcome of the model over the remaining 21 weeks. We show that our model fits correctly the real data. The second poll data is collected for 115 weeks. We estimate again the parameters of the model, and we show that our model can predict the poll outcome in the next weeks, thus, whether the need for some control strategies or not. Finally, we also perform several computational and statistical experiments to validate the proposed model in this paper. To study the influence of various parameters on these thresholds and to identify the most influential parameters, sensitivity analysis is carried out to investigate the effect of the small perturbation near a parameter value on the value of the threshold. An uncertainty analysis is performed to evaluate the forecast inaccuracy in the outcome variable due to uncertainty in the estimation of the parameters.


Introduction
To attract the attention of citizens to participate in political life, candidate parties must provide certain benefit, because this task requires time, knowledge, motivation, and money. Citizen participation is then guaranteed if there are proportional benefits with political mobilizations, individual economic, psychological, and social costs [1].
Public opinion is defined as the collective opinion of individuals within a society on a specific idea even if they have several perceptions. For example, in current American politics, many people tend to overestimate their opinions when they want to disagree with what they consider public opinion. This overestimation leads to false perceptions of public opin-ion, because all opinions, exaggerated or not, contribute to forming public opinion. Candidates and the media often use opinion polls before and during election campaigns to determine which candidates are first and which are likely to emerge victoriously. These polls are intention polls for a sample of potential voters, which reveal the expected voting quotas. Then, future results allow candidates to negotiate and discuss opinions based on a particular outcome. Public opinion polls are usually used to determine people's position in political life, votes, and belonging and political orientations by answering questions about their opinions, personal characteristics, and their activities. These answers are collected, counted, mathematically analyzed, and interpreted. These analyses allow the candidate parties to improve their campaign strategies to change their positions in public opinion in the long term [2].
In this article, we study how the preferences and expectations of voters are in competition and how they convince themselves in the face of the information of the candidates in opinion polls. We start by presenting the mathematical model that describes the evolution of opinions during polls to predict the probability of the result. Based on the nextgeneration matrix approach, we compute the thresholds of equilibrium stability. We consider two sets of data from the Reuters poll system regarding the approval rating of the US President in two terms. These two weekly polls track the percentage of Americans who agree and reject the way the president does business. To validate the underlying model, we use nonlinear least-squares regression to fit the model to the actual data. In the first survey, we only consider 31 weeks to estimate model parameters and then compare the rest of the data to the model result over the remaining 21 weeks. We show that our model fits real data correctly. The second survey data is collected for a period of 115 weeks. We again estimate model parameters and show that our model can predict the outcome of the survey in the coming weeks. Sensitivity analysis and model uncertainty are carried out to validate the underlying model.
The development of more efficient models and the accuracy of the modeling process require reliable statistical methods, whereas sensitivity analysis is one of the most recently used methods [3][4][5][6], which is used in development decisions, recommendations, understanding and quantification of systems, verification of the validity and accuracy of the model and even in identification of the important parameter for other studies [7]. Sensitivity analysis is a tool to evaluate the effect of changes in the value of the input parameter on the output value of a simulation model. The method we use in this article uses the single-point derivative when an input parameter is changed while maintaining other fixed parameters. We calculate the sensitivity indices of the equilibrium stability thresholds to the model parameters. These indices tell us how important each parameter is to changing opinions. We perform also uncertainty analysis for the proposed model to illustrate the relationship between the model and the observations from the real data.
The paper is organized as follows: Section 2 introduces our new model, giving some details about interactions between the different compartments and parameters of the model. In Section 3, we derive basic reproduction numbers. Section 4 provides the results of the parameters' estimation. In Section 5, sensitivity analysis is performed to identify the most important parameter in the proposed model for both polls. Section 6 provides the uncertainty analysis of the model parameters. and Section 7 concludes.

Presentation of the Model
Based on the fact that there are many scenarios involving binary decision, the poll model that we present here describes the opinions of agreement (or approval) and disagreement (or disapproval), regarding a candidate or idea, in polls. The model can also describe the political position of a specific candidate even on situations when there are more than two candidate parties, because we can always reduce the situation to two decisions. For instance, consider the four parts A, B, C, and D and we will investigate the political position of the party A, then we can examine the two subsets (A) and (B, C, and D) of the investigation, votes for A are considered as Agree opinions, and votes for B, C, or D are considered Disagree opinions. Or more simply, we consider the opinion poll of voters on the performance of the party studied.
Without loss of generality, we devise a mathematical model describing the evolution of Agree and Disagree opinions during polls, and the types of surveys we consider are surveys that can be answered in agreement, disagreement, or otherwise. Thus, the targeted population by the poll is regrouped into three groups, and the model herewith has been formulated using compartments. Three compartments have been considered, and each of them has been described below.
(1) Indifferent (I) Individuals. Undecided or ambivalent people, or people who do not know about the poll yet or those who abstain from voting for personal reasons. This category has weak or nonexistent attitudes about the ideas, parties, or candidates and lacked any strong positive or negative associations.
(2) Agree/Approved (A) Individuals. People in agreement with the idea being studied.
(3) Disagree/Disapproved (D) Individuals. People in disagreement with the idea being studied.
For the modeling processes, a set of assumptions has been used. These are as follows: (1) The targeted population is well mixed; that is, the indifferent individuals are homogeneously spread throughout the entire population Everyone has their reasons for agreement or disagreement, and indifferent can be convinced by reasons of approved people at a rate β 1 or by reasons of disapproved individuals at a rate β 2 . Agreeing individuals can be convinced by disagreeing people at a rate α 1 , or disagreeing people convinced by agreeing individuals at a rate α 2 . People can abstain from voting or lose the interest without any direct contact with individuals from the opposite opinion group, then agreeing people become indifferent at the rate γ 1 , and disagreeing people become indifferent at the rate γ 2 . All contacts are modeled by the standard incidence rate. A flowchart 2 Abstract and Applied Analysis describing the different interactions between the compartments of the model is presented in Figure 1. All these assumptions and considerations are written as the following system of ordinary differential equations: where Ið0Þ ≥ 0, Að0Þ ≥ 0, and Dð0Þ ≥ 0, and N = I + A + D. Note that N ′ = I ′ + A ′ + D ′ = 0; thus, the population size N is considered a constant in time. A summary of parameters description is given in Table 1. Sensitivity analysis allows us to measure the relative change in a state variable when a parameter changes while the other parameters are retained fixed at constant values. Later, we carry out a local sensitivity analysis of the thresholds to the model parameters. Next, we determine the covariance matrix which provides uncertainties on the estimation of parameters and captures relations in the measurement uncertainties based on two sets of data. To do this, we start by computing the model thresholds in the next section.

Thresholds
In epidemiology, the basic reproductive number R 0 (or epidemic threshold) is defined as the average number of secondary cases of an infection produced by a "typical" infected individual during his entire life as infectious when introduced in a population of susceptibles [8][9][10][11][12]. The threshold R 0 has been shown to be mathematically characterized in terms of infection transmission as a "demographic process", but offspring production is not seen as giving birth in a demographic sense, but it causes new infections through transmission [13]. Thus, the infection process can be considered a "consecutive generation of infected individuals". The following growing generations indicate a growing population (i.e., an epidemic), and the growth factor for each generation indicates the potential for growth. So the mathematical characterization of R 0 is this growth factor [13]. Generally, if R 0 > 1, an epidemic occurs whereas if R 0 < 1, there will probably be no epidemic.
Following this definition, we define the thresholds as follows: R D0 is the average number of new disagreements produced by an individual disagreeing introduced in a population of indifferent people, during the period in which he or she was in this opinion. And, R A0 is the average number of new agreements produced by an individual in agreement and that was introduced into a population of indifferent people during the period in which he or she was in that opinion.
For the analysis of epidemic models, the first step is to calculate the disease-free equilibrium (DFE). This equilibrium point is then used to calculate the basic reproductive number using the next-generation matrix. In this contribution, let R X0 be the threshold of growth of the opinion X (either X = A "Agree" or X = D "Disagree"). Then, R D0 is the threshold associated with the Disagree-free equilibrium while R A0 is the one associated with the Agree-free equilibrium.
To use the next-generation matrix method, first, we need to determine the Disagree-free equilibrium and the Agreefree equilibrium, and second, we can derive associated thresholds. Recall that ( [8]) the points of equilibrium of the systems (1)-(3) are the solutions of

Abstract and Applied Analysis
For the Disagree-free equilibrium, when there is no negative opinion, that is when D = 0, we have where I * and A * represent the numbers of indifferent and agreeing individuals, respectively, in the absence of disagreeing people. Hence, for the system governed by (1)-(3), the disagree-free equilibrium is Following the second-generation approach [13], we compute the threshold R D0 associated to the disagree-free equilibrium, which is Regarding the Agree-free equilibrium when there is no positive opinion, i.e., if we put A = 0, we have where I * and D * represent the numbers of indifferent and disagreeing individuals, respectively, in the absence of agreeing people. Therefore, for the system governed by (1)-(3), the agreefree equilibrium is By following also the second-generation approach [13], we compute the threshold R A0 associated to the Agree-free equilibrium, which is 4. Parameter Estimation 4.1. Methods. The approval rating is the percentage determined by the survey process that indicates the percentage of respondents who agree with a particular individual or program. Habitually, a level of approval is given to a politician based on answers to a public opinion poll in which persons' samples are invited whether they approve or disapprove this particular political person. Due to the availability of data in the form of proportions, we need to work in terms of the proportions of Indifferent, Approved, and Disapproved individuals; thus, let i = ðI/NÞ, a = ðA/NÞ, and d = ðD/NÞ denote the fractions of the classes I, A, and D in the population, respectively. By replacing I by i, A by a, and D by d,
To verify the reality of this model, we use nonlinear leastsquares regression to fit the model to actual observations by following the method used in [15], where the authors estimated the parameters of their influenza epidemic models. The least-squares method, which involves minimizing the sum of the squared differences between the measurements and the model predictions, is used to estimate the unknown parameters for the two influenza epidemic models presented in [15].
Therefore, the following process has been followed for parameter estimation: using MATLAB ode45 routine, the system of ordinary differential equations is solved numerically, with initial chosen values for parameters and state variables from Table 1. Model outcomes are compared with the field data, and the Levenberg-Marquardt optimization algorithm determines a new set of parameters' values with the model outcomes in a better fit to the field data [16,17]. After new parameters' values are determined by this optimizer, the system of ordinary differential equations is solved numerically using these new parameters' value and the model outcomes are compared again with the field data. This iteration process between parameter updating and numerical solutions of the system of ordinary differential equations using the Runge-Kutta method [18,19] continue till convergence criteria for the parameters are met. In this process of estimating parameters, about one thousand values are chosen using a random process for each of the parameters to be estimated.

First Poll.
To estimate the model parameters, we use the data from a poll carried out by the Reuters polling system entitled "Approval of Obama's handling of job as president" from [14]. The data in Table 2 are collected for approximately 52 weeks, from January 17, 2016, to January 1, 2019. The poll is answered with the following three questions: approve, disapprove, and mixed feelings. Here, we consider the third class of mixed feeling as indifferents, see Figure 2.
To validate our model, we chose as a period of parameter estimation from 17/01/2016 to 21/08/2016 (31 weeks), see Figure 3. Thus, the estimated parameters' values are used to validate our model, by plotting the three states of the model with the rest of the real data from 28/08/2016 to 08/01/2017 (20 weeks), see Figure 4. It can be seen from this figure that the model fits correctly the real data, and it can predict the position of people opinions at every next point of time and then the outcome of the survey.
Based on the estimated parameters in the first period, and by results of [8], we can see which equilibrium point is stable; then, we can predict the outcome of the long-term survey. By calculating the thresholds corresponding to this poll, we use the parameters' values from Table 3 and we get R A0 = 0:9943 < 1, And we can see that β 1 > γ 1 and β 2 > γ 2 , which means that the model presents two equilibria in stability at the same time, that is, the Disagree-free equilibrium e 1 = ð0:0004,0:9996,0Þ  [14], entitled: Overall, do you approve or disapprove about the way Obama is handling his job as President? see Table 2 for more details. 5 Abstract and Applied Analysis and the Agree-free equilibrium e 2 = ð0:0635,0, 0:9365Þ, see [8] for more details. However, only the knowledge of the stability of the equilibria does not guarantee the knowledge of the result of the survey, because the convergence of the states of the model towards this equilibrium can take a long time. Voting could end before this convergence. In the situation of the bistability of equilibria, the knowledge of the poll outcome could be more difficult.
Despite that, these results provide an insight into the position of people's opinions and the course of events, to help the authorities take the necessary measures to bring the situation under control. For example, we can see that by having the data for 30 weeks, we can mathematically estimate the result of the survey before 20 weeks from the end of the poll, see Figure 4. Thus, our results are more effective in predicting the outcome of the poll. We can see from Figure 4 that the Disagree-free equilibrium is the more attractive, because D started to tend towards 0. This means that after this decrease, there is no more need for interventions.

Second Poll.
A poll is carried out also by the Reuters polling system entitled "Overall, do you approve or disapprove about the way Donald Trump is handling his job as President?" from [20]. The data in Table 4 are collected for The survey is answered also with the following three questions: approve, disapprove, and mixed feelings. These data are plotted against time in Figure 5. It is clear that Figure 5 depicts a positive equilibrium e * , see [8] for more details about equilibria. When the iðtÞ remains below 10% and dðtÞ is greater than aðtÞ from the beginning of March 2017 until the end of the poll on 07 April 2019, we will use the data plotted in Figure 5, to estimate the parameters of models (11)-(13) for this poll.
Estimated parameter values of models (11)-(13) for this poll are given in Table 5. By calculating the thresholds corresponding to this poll, we use the parameters' values from this And we can see that β 1 γ 2 − β 2 γ 1 = −0:0799 < 0 and α = α 1 − α 2 > 0, which means that the model presents the equilibrium e * = ð0:0651,0:3868,0:5480Þ in stability, see the proof of Proposition 5 and Remark 2 in [8]. In this case, it may be noted that some control strategies are necessary to change the course of events prior to Election Day. Figure 6 depicts the effectiveness of the estimation of the parameters, while in (a) we can see the indifferent individual curve of the model plotted with the solid line while the data of Indifferent individuals is plotted with small crosses. In (b), a comparison of the Agree people function of the model plotted with the solid line and the data of Agree individuals is plotted with small crosses. Also in (c), the estimated function of Disapproved people is plotted with the solid line and the data of disagree individuals is plotted with crosses. In Figure 7, the three functions of the model and the three states of the real data are plotted with different patterns to show that the model fits the data correctly.

Sensitivity Analysis
Sensitivity analysis is measured using the sensitivity index. Sensitivity indices allow us to measure the relative change in a state variable when a parameter changes while the other parameters are retained fixed at constant values. The normalized forward sensitivity index of a variable to a parameter is the ratio of the relative change in the variable to the relative change in the parameter. When the variable is a differentiable function of the parameter, the sensitivity index may be alternatively defined using partial derivatives.
A summary of sensitivity indices for R D0 and R A0 corresponding to the first poll is given in Figure 8, and to the second poll is given in Figure 9.

Model Uncertainty
A covariance matrix estimates the variability of the model parameters and hence random disturbances in the output. Thus, covariance matrices contain information about the uncertainties in the model output. All covariance matrices are symmetrical. The absolute value giving accuracy information is contained in the diagonal elements of the covariance matrix. All other elements in covariance matrices can be used to study the interrelationships. If the covariance between any two coefficients is positive, then the values of the coefficients tend to vary in a positive way. On the other hand, if the covariance between any two coefficients is negative, then the coefficients' values tend to move in opposite directions.
Covariance matrices contain negative and positive elements; negative covariance means that the parameters' values on those parameters tend to move in opposite directions which means that if one parameter's value increases, then other parameter's value starts decreasing, while positive covariances mean that these parameters' values tend to vary in a positive way which means that if one parameter's value increases, then other parameter's value also increases. The covariance matrix ∑ 1 contains only positive elements. This means that these parameters' values vary in a positive way.
As shown in the covariance matrix ∑ 1 , the covariance between transmission coefficients β 1 and β 2 is 0:9856, the covariance between transmission coefficients β 1 and α 1 is 2703:3959, the covariance between transmission coefficients   Table 5 and the real data of the second poll. 11 Abstract and Applied Analysis β 1 and α 2 is 2703:3971, the covariance between the transmission coefficient β 1 and the parameter γ 1 is 0:0609, and the covariance between the transmission coefficient β 1 and the parameter γ 2 is 0:0494.
The covariance between the transmission coefficient β 2 and the parameter α 1 is 27274:0064, the covariance between the transmission coefficient β 2 and the parameter α 2 is 27274:0112, the covariance between the transmission coefficient β 2 and the parameter γ 1 is 0:0467, and the covariance between the transmission coefficient β 2 and the parameter γ 2 is 0:2942.
The covariance matrix ∑ 2 given in the Appendix contains negative and positive elements.
As shown in the covariance matrix ∑ 2 , the covariance between transmission coefficients β 1 and β 2 is positive (0:000104), the covariance between transmission coefficients β 1 and α 1 is positive (64:542655), the covariance between transmission coefficients β 1 and α 2 is positive (64:540946), the covariance between the transmission coefficient β 1 and the parameter γ 1 is positive (0:001141), the covariance between the transmission coefficient β 2 and the parameter γ 2 is positive (0:000968), the covariance between transmission coefficients α 1 and α 2 is positive (10927142:47), the covariance between the transmission coefficient α 1 and the parameter γ 1 is positive (55:366504), and the covariance between the transmission coefficient α 2 and the parameter γ 1 is positive (55:3651). This means that these parameters' values vary in a positive way.
The covariance between the transmission coefficient β 1 and the parameter γ 1 is negative (−0:000676), the covariance between transmission coefficients β 2 and α 1 is negative (−67:701944), the covariance between transmission coefficients β 2 and α 2 is negative (−67:700578), the covariance between the transmission coefficient β 2 and the parameter γ 1 is negative (−0:000362), the covariance between the transmission coefficient α 1 and the parameter γ 2 is negative (−70:976139), the covariance between the transmission coefficient α 2 and the parameter γ 2 is negative (−70:974401), and the covariance between parameters 12 Abstract and Applied Analysis γ 1 and the parameter γ 2 is negative (−0:00067). This reflect that these parameters' values tend to vary in a negative way. Table 6 shows the summary of covariance of the Agree-Disagree model parameters for the poll data.

Conclusion
In this paper, an IAD model-type compartmental model has been considered to explore agree-disagree opinions during polls. The next-generation matrix method is used to derive thresholds of equilibrium stability R A0 and R D0 . Two sets of data from the Reuters polling system regarding the approval rating of the U.S presidential in two terms are used to estimate the parameters and to validate the underlying model.
To identify the most influential parameter in the proposed model, local sensitivity analysis is carried out. We calculated sensitivity indices based on the estimated parameters to identify the most influential parameters for both polls. We found in the first poll that the most important parameters in determining the thresholds R D0 are α 1 , disagree to agree transmission rate, α 2 , agree to disagree transmission rate, and the parameter γ 2 , of the loss of interest of the disagreeing people, while the most influential parameters on the threshold R A0 are α 1 , disagree to agree transmission rate, and α 2 , agree to disagree transmission rate.