Electoral Forecasting Using a Novel Temporal Attenuation Model: Predicting the US Presidential Elections

Electoral forecasting is an ongoing scientific challenge with high social impact, as current data-driven methods try to efficiently combine statistics with economic indices and machine learning. However, recent studies in network science pinpoint towards the importance of temporal characteristics in the diffusion of opinion. As such, we combine concepts of micro-scale opinion dynamics and temporal epidemics, and develop a novel macro-scale temporal attenuation (TA) model, which uses pre-election poll data to improve forecasting accuracy. Our hypothesis is that the timing of publicizing opinion polls plays a significant role in how opinion oscillates, especially right before elections. Thus, we define the momentum of opinion as a temporal function which bounces up when opinion is injected in a multi-opinion system of voters, and dampens during states of relaxation. We validate TA on survey data from the US Presidential Elections between 1968-2016, and TA outperforms statistical methods, as well the best pollsters at their time, in 10 out of 13 presidential elections. We present two different implementations of the TA model, which accumulate an average forecasting error of 2.8-3.28 points over the 48-year period. Conversely, statistical methods accumulate 7.48 points error, and the best pollsters accumulate 3.64 points. Overall, TA offers increases of 23-37% in forecasting performance compared to the state of the art. We show that the effectiveness of TA does not drop when relatively few polls are available; moreover, with increasing availability of pre-election surveys, we believe that our TA model will become a reference alongside other modern election forecasting techniques.


Introduction
Understanding the dynamics of information, which shapes many aspects of our social lives, is a major research drive in our increasingly networked society [1,2,3,4]. Be it under the form of a commercial, a rumour, a virus, or a blog post, information diffusion (or propagation) receives substantial attention from multidisciplinary fields of research [5,6,7]. Out of these, the ability to predict election outcomes is just one of many areas of research that sees benefits from cutting-edge investigation techniques, like social media analysis and data science [8,9,10].
Research on forecasting election polls was originally constructed by employing classic statistical models, applied on opinion polls prior to the election day [11,12]. Ever since the late '70s, it became a scientific fact that correct timing of the election date can be crucial for the outcome [12]. Election forecasting -especially in the case of presidential elections -employs so-called macromodels [13]; these are statistical models based on national economic and political fluctuations. On the other hand, micromodels are models based on surveys of individual voters during the pre-election period [14]. Current state of the art in forecasting employs multilevel regression and post-stratification [15,16,17] (MRP). Hence, we mention several reputable institutions in the United States, dedicated to the analysis of election data using variants of MRP, like Real Clear Politics, Huffington Post, FiveThirtyEight, Daily Kos, or Understanding America Study. Arching over the specific methodologies employed by these platforms, we summarize them as follows: • Poll weighing/averaging -polls from different sources are weighted based on the pollster credibility.
• Poll adjustment -number of likely voters, convention influence, omission of third party candidates are taken into account.
• Adding demographic and optional economic data -used to scale surveys at state and national levels.
• Simulating -using a probabilistic distribution to account for uncertainty in the data.
Apart from techniques like MRP, we find studies proving that alternative subjective surveying methods may also be efficient forecasters. Thus, the American National Election Surveys from 1956 to 1996 show that voters could themselves better forecast who will win the presidential elections [14]. Another study shows that quick and unreflective facial judgments of gubernatorial candidates are more accurate in predicting the winners than deliberating on the competence of each candidate [18]. In essence, voter forecasting models derived from vote expectations, represent a promising alternative to classic statistical approaches [19].
Bridging over to social networks and media, there is a scientific debate on how the wide coverage of publicized opinion polls in media can affect voters before election [20]. It is already known that social networks have a decisive role in the diffusion of information, and have proven to be very powerful in many situations involving macroscopic behavior [21,8]. Examples include, decisively influencing the Arab Spring in 2010 [22,23], and the U.S. presidential elections in 2008 [24], and 2012 [25]. Analyzing the dynamics of this social layer can offer substantial predictive power over the real-world social networks they model. Studies on diffusion predictability are further found in marketing and public relations [7,26], epidemic spreading [27], hurricane forecasting [28], or forecasting boxoffice revenues of movies [29].
Network science proposes understanding diffusion processes by designing interactions at micro-scale level (i.e., between individual social agents), and forecasting opinion evolution at macro-scale level [30]. Specifically, the macroscopic behavior is inferred by: (i) 2 monitoring when social agents become indoctrinated by their neighborhood (i.e., they adopt information, get infected, buy merchandise [5,7,31]), then, (ii) predicting how cascades of information flow, and how the diffusion process is percolated by individuals. Nevertheless, temporal aspects are shown to play an essential role in the diffusion of influence [9,10], as does the timing of publicizing opinion polls [20] and even organizing elections [12]. Consequently, this paper builds upon the premises that we are able to extrapolate the macroscopic behavior of a society (here, in the context of elections) by inferring microscopic temporal dynamic models during the pre-election period. Thus, our contributions in this paper are: • We formulate an analytic methodology for modeling the macro-scale evolution of a multi-opinion system, targeting better election poll forecasting.
• We define an experimental setup, based on pre-election data, to validate the underlying assumptions of our approach.
• We present a comprehensive case study on US Presidential Elections to measure the efficiency of our approach against state of the art forecasting estimates, including MRP, as recently used by the best pollsters in the USA.
• We explore the feasibility of applying TA in real time, during an ongoing preelection period, and compare its performance to MRP at several points in time, relative to the election day.

The temporal attenuation model
Application of TA starts by gathering a set of pre-election multi-opinion polls with temporal information, i.e., corresponding to specific relative dates before an election. Based on the number of candidates for election, we define temporal poll vectors p i (t), where i is the index of the candidate in the multi-opinion system, and t is the time (date) of the poll. We define as opinion injection, at any time 0 ≤ t < d, all discrete observations which stem from the public opinion polls p i (t), preceding the election day d. We further define a discrete temporal election axis t = [0, d) as being relative to the date of the first opinion poll (which becomes p i (t = 0)), and the last opinion poll prior to the election day t = d.
An essential aspect of TA is to reproduce the real-world timing when injecting opinion, i.e., in this paper, at the level of day, as explained in Figure 1a. As such, we do not condense consecutive polls, one after another, but scatter them along the temporal axis in order to model bounces and periods of relaxation mirroring the real-world general opinion.
Given the available poll vectors p i (t), we construct dataset P consisting of daily opinion corresponding to each candidate. Since there may be days without available polls, we compensate every such day 0 < k < d by adding a 0 (no vote) for each candidate. Consequently, we can describe dataset P and an individual poll vector p i (t) as: Equation 1 specifies that P consists of continuous (daily) poll vectors 0 ≤ t < d. Equation 2 specifies that if a poll is available for a specific day, then we calculate a normalized opinion value based on the raw poll data p * (e.g., number of voters, vote percentages). In the validation data we have polls ranging from a few hundred to tens of thousands of voters, so that a normalization of the raw amplitudes is recommended.

Micro-scale interactions
Most existing micro-scale models for opinion injection rely on fixed thresholds, or thresholds evolving according to simple probabilistic processes [32,33,34], and are impervious to any temporal aspects [35]. Nonetheless, studies applied in epidemiology propose three popular parametric models for the likelihood of disease transmission rates, considering time as a parameter [36,37]: power-law, exponential, and Rayleigh. These functions model the temporal damping (fading) of infectiousness after exposure, yet, they may be used to trace the evolution of opinion after each injection.
In this paper, we introduce the power-law (PTA) and exponential (ETA) temporal attenuated models, and analyze their efficiency. In their continuous epidemiological formulation [37], these models express the transmission likelihood λ i (t) of a disease in time, after a relative time ∆t since an individual i was infected, as expressed by the following expression: λ i (t) = α i · ∆t −βi for PTA, respectively λ i (t) = α i · e −∆tβi for ETA.
Additionally, we parameterize the two TA models with an amplitude factor α i and a damping factor β i , specific for every candidate i. The α i factor determines the amplitude of the positive bounce when opinion is injected, and the β i factor controls the damping speed towards the relaxed state (λ(t → ∞) = 0) for any candidate (see Figure 1b).

Macro-scale emergent behavior
Based on the introduced temporal micro-scale models, we define the concept of opinion momentum M i (t). The momentum of each candidate i is an aggregated macro-scale estimator for the evolution of opinion of the entire voter system. We extrapolate M i (t) for PTA and ETA as follows: The amplitude α i evolves according to the following rules: if we are during a relaxation state, when there is no opinion injection at moment t (i.e., p i (t) = 0), then α i remains unchanged. Consequently, as we progress on the discrete time axis (t → t+1), momentum M i (t) will decrease. On the other hand, if opinion is injected and we have a poll p i (t) > 0 at the current moment, then α i is increased by an amplitude proportional to the normalized number of votes p i (t). The evolution in time of α i is given by the following equation: By simulating the evolution of each opinion momentum in time, we can infer the current opinion Ω i by normalizing the momentums of each candidate i as follows: The process of evolving momentums M i (t) and opinions Ω i (t) is detailed on a proof of concept voting system in Figure 1. A detailed comparative analysis of TA is further provided given in Appendix A.

Pre-election period
Pre-election period D 70% 30% 50% 50% 57% +5 +3 +2 Figure 1: Overview of the temporal attenuation (TA) model applied on two candidates (C1-blue and C2-red) receiving an equal number of votes (i.e., suggesting a 50-50% tie) over 6 days before election, but according to different temporal patterns. (A) Surveys are collected for the two candidates at t = {2, 3, 5, 6}. From these, set P is assembled consisting of poll vectors p 0 (t) and p 1 t. (B) Impact of damping factor β on a TA function with one spike at t = 2, followed by a continuous relaxation state. A higher β translates into a more abrupt damping of the momentum. (C) Momentum M i (t) evolution for PTA and ETA corresponding to the poll vectors p 0 (t) and p 1 (t). Individual votes are displayed in absolute value on the graphs. The simulation using dataset P corresponds to the pre-election period Opinion Ω i (t) evolution for PTA and ETA corresponding to the momemtums in panel (c). Several poll differences are displayed at t = {2, 4, 6} using the color of the virtual winner at that moment.

Temporal attenuation algorithm
By corroborating all the introduced terms, we present the flowchart of applying TA on a pre-election dataset in Figure 2, and provide the supporting algorithmic pseudocode in Algorithm 1. The required input is a dataset consisting of pre-election polls, where each poll expresses opinion for each candidate i in the multi-opinion system. The first stage of the algorithm (data preparation), depicted in Figure 2a, creates the intermediary dataset P consisting of daily poll vectors p i (t) for each day 0 ≤ t < d. The second stage of the algorithm (temporal attenuation), depicted in Figure 2b, computes the momentum of opinion M i (t) from P using the amplitude α i (t), damping factor β i , and daily poll vectors p i (t). The output, represented by the daily opinion evolution Ω i (t), is computed from each momentum M (t).
Algorithm 1 Electoral forecasting algorithm based on temporal attenuation (TA) 1: Input: Pre-election polls for each candidate i, with timestamp 2: Stage A: Data preparation 3: sort polls by date in increasing order 4: assign first poll's date as day t ← 0 5: assign election date as day t ← d 6: assign relative day 0 ≤ t < d for ∀ pre-election poll → p i (t) 7: for each day t ∈ [0, d) assign p i (t) according to Equation 2 → P 8: compute opinion momentum M i (t): 12: for ∀ candidate i in the multi-opinion system P do 13: compute opinion Ω i (t): 24: for t ∈ [0, d) do 25: end for 27: Output: Evolution of daily opinion Ω i (t) towards each candidate i in time 0 ≤ t < d

Experimental validation
Prediction models, where microscopic interactions are considered, consistently represent data as information cascades triggered by so-called opinion sources (also spreader nodes, stubborn agents, vital nodes) [32,33,34]. While these interactions are meaningful, accounting for all of them is still technologically, ethically, and legally impossible (e.g., analyzing all tweets posted by all users in the USA before an election). As such, we use ubiquitous data under the form of pre-election opinion polls, which are centralized and open, gathered from Real Clear Politics, Understanding America Study, and Daily Kos (detailed in Methods).
We present a case study on the US presidential elections between 1968-2016 in order to showcase the superior performance of our TA method. For each election year, we 6

……….
Election date → t=d  The data preparation stage takes as input a set of pre-election polls with timestamp, and produces the daily opinion set of poll vectors P as an intermediary output. Polls are first sorted by date from first to last, then the first date is assigned as relative day t = 0, while the election day is assigned as relative day t = d. Each other poll gets assigned a relative day 0 ≤ t < d. P is created by either setting a normalized poll for that day (p i (t)), or, if no poll is available, an empty poll vector {0, ...0}. (B) The temporal attenuation stage further takes P as input to compute the momemtum M (t), based on α i (t), β i , and p i (t). Consequently, the daily opinion Ω(t) towards each candidate is computed as output from M (t).
compare the poll estimates obtained with our PTA and ETA methods, to the statistical methods of survey averaging (SA), cumulative vote counting (CC) (see Methods for details), and the best pollster estimations at the respective time.
Given the nature of the US presidential election system, we validate TA on a three candidate system. We refer to these as the Democratic (D), Republican (R) and "other" (O) candidate. The ground truth for forecasting validation represents the actual poll results from each respective election.

US Presidential Elections datasets
Data were aggregated from Real Clear Politics (2012a, 2016a), Understanding America Study (2016b), and Daily Kos (1968-2008, 2012b). Table 1 provides information on all 15 datasets alongside the US presidential election results for the Democratic, Republican and "other" candidates between 1968-2016. These values are used as ground truth for measuring the performance of our TA method. Extended information on the data is found in Appendix D.

Alternative poll estimation methods
Cumulative vote counting (CC) is applied by summing up all votes expressed by the polls p * i (t) for each candidate i over the total polling period [0, d). Note that for CC we do not use the normalized value p i (t), but the absolute number of votes expressed in each poll p * i (t). Consequently, we define a cumulative momentum cM i (t) which is updates as: At the end of the polling period (t = d), each cumulative momentum cM i (t) will store the total number of votes expressed for each candidate i. At any time, we can infer the current opinion towards a candidate cΩ i (t) by normalizing each momentum: Survey averaging (SA) is applied by averaging the normalized poll results over the entire pre-election period. In order to express opinion sΩ i (t) after t elapsed days, we use the normalized poll vector p i (t) directly: In Equation 8 we obtain the current poll at time t (expressed as number of days) by summing up all normalized votes for the period [0, t] and divide by the number (cardinal ) of polls in that same period.
The main distinction between CC and SA is that the first method uses the absolute number of votes for each candidate, whereas SA uses the normalized poll values.
For each forecasting simulation, we measure the total error ε expressed as percentage offset from the final election results. Based on the real results under the form Ω candidate year (given in Table 1), we define the relative estimation error ε as the sum of positive differences between the estimation error for all three candidates c ∈ {D, R, O}: By varying α i only the scale of the momentums M i (t) changes. Hence, since we normalize all M i (t) in order to obtain the poll results Ω i (t), the value of the amplitude becomes irrelevant. On the other hand, we notice that the impact of β i is significant. As an example, we display in Figure 3    The same process of finding an ideal combination (α, β) can be repeated for each dataset. However, choosing an optimal damping factor during a real-time pre-election period is not plausible because we cannot compute ε without the final, real results. Therefore, we compute a pseudo-ideal β factor from the average of the best β's measured over all past datasets, in the perspective that it should be used for present and future predictions. As such, all forecasting simulations presented in this paper are based on the same data-derived damping factor β = 1.1 for PTA, and β = 0.78 for ETA.

Results
A good electoral forecasting method should have two qualities: (i) it should produce poll estimations for all candidates that are -overall -as close as possible to the real election results, and (ii) it should correctly determine the winner of the election.
We quantify the first property by the total estimation error ε given as the sum of positive differences between the estimation error for all candidates (see Equation 9). A smaller ε translates into a more performant forecasting method. Table 2 displays ε between each forecasting method and the actual election results. Averaged over all datasets, we measure a forecasting error of 3.64 points for the best pollster, 7.31 for CC and 7.49 for SA. The errors of TA are only 3.28 points for PTA and 2.87 points for ETA. These results translate into a superior performance of TA over all other state of the art methods. We obtain a 0.364 point improvement for PTA, respectively 0.774 for ETA compared to the best pollster estimations. The second property is quantified by the offset from the real difference between the Democratic and Republican candidates (D-R), expressed in percentage points. Table 3 represents the real (D-R) differences for all election years, followed by the relative offsets of each forecasting method from this difference. For example, in 1968, (D-R)=-0.7 and the offset of PTA is -1.43 (we can infer that PTA estimated the (D-R) difference at −0.7 + (−1.43) = −2.13 points). By averaging all forecasting offsets (in positive value) over all datasets, we measure a total (D-R) offset of 1.95 points for the best pollster, 4.07 for CC and 3.59 for SA. TA outperforms again the other methods by scoring an average offset of 1.74 for PTA and 1.91 for ETA.
Additionally, we investigate how many out of the 13 elections (1968-2016) are correctly predicted in terms of picking the right winning party. As such, the reference statistical methods are the least performant, with CC predicting only 10 out of 13 presidents, and SA predicting 11 out of 13. The best pollsters manage to predict 11 out of 13, while PTA and ETA predict 12 out of 13 winners. No method was able to predict the correct 2016 winner. We did find one pollster which correctly predicted the winner of those elections, but did so by a much greater error in terms of popular vote. Supporting Table 3: Column (D-R) represents the absolute percentage difference between the Democratic and Republican candidates after election. All subsequent columns represent the relative offset of the forecasting methods from the (D-R) difference. A smaller offset means a closer estimation of the winning party.  Tables 2, 3 are provided in Appendix E. Furthermore, we highlight in Figure 4 the superior estimation performance of TA compared to the state of the art. We underline the fact that our TA methods outperform the best pollster, in terms of estimation error ε, on 8 out of 15 datasets (PTA), respectively 12 out of 15 datasets (ETA). Overall, TA outperforms the competing predictors in 10 out of 13 election years (77%). The upper panel in Figure 4 graphically represents the ratio between the best pollster prediction error and the TA method prediction error (i.e., ε Best /ε T A ). As such, values (represented as columns) over 1.0 mean a higher performance for our TA methods. The lower panel in Figure 4 classifies the cases when our PTA or ETA methods outperform the best pollster prediction (i.e., with green, otherwise red), for each election dataset. Also, we represent the cases when any of the three methods compared manage to forecast the correct winner of the elections. The only notable difference is that the best pollster does not succeed to forecast the winning party in the 2000 elections.

Real time feasibility analysis
We extend our analysis by exploring the feasibility of our theoretical framework in the context of application during a real time pre-election period. Thus, we use the 2016a (RCP) dataset and compare the prediction errors ε at different points in time. The dataset consists of 259 polls over a 529 day period prior to the elections. We choose arbitrarily to measure the predictions and corresponding ε at t = {100, 250, 400, 500, 529}. Table 4 represents the estimation errors ε for all five forecasting methods (here the best pollster is RCP), and the improvement ratio is ε RCP /ε ET A . Values greater than 1 mean higher performance for ETA. Detailed experimental results are given in Appendix B. Relative performance ratio PTA ETA Figure 4: Ratio between the best pollster prediction error and the PTA (violet), respectively, ETA (yellow) prediction error. Values above 1 translate into higher prediction performance of our TA methods.
In the lower panel we highlight the cases when TA outperforms the best pollster (green, or red otherwise), and the election years in which any of the three predictors manage to forecast the correct election winner. We find that the prediction accuracy of the statistical methods (CC, SA) depends mainly on the amount of data, as their forecasts slowly converge towards the real results. Conversely, the other methods do not depend on the increasing amount of data (as we get closer to the election day), but rather on the volatility of the socio-political context. Namely, RCP has the highest fluctuations, registering jumps from a low ε = 2.8 (February 2016) to a very high ε = 10.2 (July 2016), then falling back to ε ≈ 4 − 5 (October 2016). Our TA methods register more stability than RCP, and are not influenced by the same social volatility we measured in RCP. We note that the Democratic candidate gathered increasing popularity until March 2016, so that TA reflects this by giving her a higher virtual chance of winning. Nevertheless, as the popularity of the Republican candidate rapidly increased, during mid-spring and mid-summer 2016, our forecasting becomes better leaning towards a balanced outcome that is closer to the final registered popular vote.
We have found that, unlike the best pollsters, which rely on MRP corroborated with social, economical and political trends, our TA method improves its forecasting based solely on the time-aware convergence of public opinion, which can be considered of significant estimation prowess [14,38,39].

Discussion
Our study differs in several respects from previous work on election forecasting. In comparison to basic statistical approaches, like CC and SA, our TA needs additional temporal information on each pre-election poll (i.e., date when a poll was made public). Unlike simple averaging of the information, we feed the survey data to our simulation framework which is highly influenced by the temporal aspect. Both PTA and ETA methods model opinion momentum, in time, as a function which bounces up when opinion is injected, and dampens down otherwise. This process resembles the way a capacitor charges (i.e., opinion being injected) and discharges slowly (i.e., relaxation state, no opinion injected). Compared to the state of the art methods, like MRP [15,16,17], our TA does not need any demographic, economic, or political information related to the context of the election. This distinction represents a significant advantage for TA over MRP since our method may be applied, given enough reliable public polls, in any political region of the world. Similar to the case study in this paper, we did not consider any additional information about the USA during the 1968-2016 period.
In essence, the TA model is aimed at improving the prediction of the popular vote. Nevertheless, we find studies especially tailored to systems like the US, which are based on the college system [17,16], and, conversely, tailored to systems utilizing a direct popular vote, like France [11]. The work of [17,16] manages to forecast US presidential, senatorial, and gubernatorial elections at the state level by incorporating state level demographics to better predict the college vote. However, we have developed the TA forecasting model to be usable outside any political context, as long as there is sufficient and reliable pre-election poll data. This choice may give it an apparent disadvantage in the US system, but as our case study was intended to show, in practice TA still yields superior performance. Moreover, where other models may need specific tuning to be used in other countries of the world, TA will work without the need for customization.
In this study we start from the premises that the opinion injected in social networks, stemming from publicly accredited opinion polls, has a very high media coverage. To this end, recent studies on how US adults keep themselves informed about political candidates and issues, show that TV (news) occupies the leading spot with 73%, followed by 45% for news websites/apps, 24% for newspapers, and 21% specifically for social media. These statistics are in favor of our premises since opinion injection from pollsters is practically done through all the enumerated media types [40]. Furthermore, in terms of polling reliability, current media types are diverse, but their combined coverage remains high, including in the electoral context, and polling accuracy remains reliable [41].
Finally, as an explanation to why TA outperforms more complex data-driven methods used by pollsters (e.g., MRP), we notice that the forecasts of TA for the "other" 13 (O) candidates are lower, and implicitly closer, to the real results. Averaged over all datasets, the pre-election surveys predict that 11.61% will vote for the O candidate; TA predicts 6.94%, and the best pollster predicts 7.57%. However, following each election, we compute the real average percentages for the O candidate at only 5.08%. This means that, even though the forecasts for the D and R candidates may be realistic, the public opinion polls are unable to distribute a difference of ≈ 6.5% of remaining votes. On the other hand, the best pollsters are unable to distribute ≈ 2.5% of overall votes, and TA only ≈ 1.8% of votes. This observation does not mean that TA simply overestimates the percentages for the two main candidates; it means that TA is able to distribute the votes for the O candidate more realistically based on the dynamics of expressing opinion just before the end of the pre-election period, which usually sees an abrupt drop of ≈ 30% in popularity for O (see an extended statistical analysis in Appendix C. Of course, these performance gains of TA can be further analyzed in future research, and supported by both social psychology or political science assumptions.

Limitations of the model
Our TA model brings some limitations along, which we further discuss. For instance, we consider social media as an ubiquitous diffusion mechanism, but there are also, so called, non-users. We added this form of simplification to our model due to difficulty in acquiring data for offline users, and due to the reliability of that data. Official statistics approximate that 3/4 of the US population are engaged in social media. Even in this case, we argue that our model's simplification remains robust, as a study on political attitudes concludes that no statistically significant differences arise between social media users and non-users on political attention, values or political behavior [42] Another realistic simplification in our model allows us to consider the electoral system relatively hard to shape from the outside, so that we do not have to account for data beyond our reach (i.e., external influences). The liberal democracy index was developed to measure the robustness of a political system, and, according to a study by the Swedish V-Dem institute, the USA scores 0.75 (out of 1) and lies within the top 20% liberal nations [43]. As such, we can consider the studied US electoral system as robust.
Existing vote polarization and poll credibility are also important topics to consider in the future [44], however, our electoral forecasting model was designed to be, as much as possible, unaffected by any social and political contexts, including the effects of opinion polarization.

Conclusions
Driven by the increases in access to data and computational power, modern election forecasting systems should, intuitively, evolve along one of two directions: a possible microscopic framework built on extensively detailed social media data, or a possible macroscopic framework employing complex data science techniques on demographics and economic indices. However, our proposed model represents a trade-off between both micro and macro worlds, and the result is a simple, intuitive and robust methodology which can be applied on any pre-election data with temporal information. We argue that this simplification is effective since social influence often pertains to the knowledge of crowds [38]. In other words, the aggregated judgment of many individuals (macroscale) can be more accurate than the judgments of individual experts (micro-scale) [39]. This effect is significantly strengthened when applied on larger population sizes [38]. 14 Despite the apparently simplified assumptions behind TA, revolving around the idea that we can apply a microscopic opinion interaction model to predict macroscopic behavior, our results pinpoint to the fact that time-awareness is more significant in poll forecasting than previously considered. In our case study, TA outperforms state of the art election forecasting methods in 10 out of the 13 presidential elections. TA accumulates an average forecasting error of 2.87-3.28 points, while statistical methods accumulate 7.48 points error, and the best pollster estimations accumulate 3.64 points. This translates into a roughly 30% prediction improvement for our method, in terms of forecasting accuracy of the popular vote.
Moreover, analyzing the methods of reputable institutions in the US, like the Huffington Post, Real Clear Politics, or Five Thirty Eight, we have not seen any temporal attenuation method that is similar to the one proposed in this paper. Other statistical, or data science approaches (e.g., MRP) rely on specific social, economical, and political contexts to improve and tune their predictions. Conversely our TA does not require socioeconomical contextual information, and we believe that this independence translates into an advantage. It will probably never be possible to create the perfect forecasting system, due to the complexity of elections, but our TA represents a novel and distinguishable scientific proposal with proven high performance.

Conflicts of interest
The authors declare that there are no conflicts of interest.