Analyzing the influence of web search behavior on electricity market price: a case study of Japan electric power exchange

Gotoh, Ryosuke

doi:10.1007/s42001-024-00259-6

Analyzing the influence of web search behavior on electricity market price: a case study of Japan electric power exchange

Research Article
Open access
Published: 03 April 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Journal of Computational Social Science Aims and scope Submit manuscript

Analyzing the influence of web search behavior on electricity market price: a case study of Japan electric power exchange

Download PDF

Ryosuke Gotoh ORCID: orcid.org/0000-0002-1721-3550¹

196 Accesses
Explore all metrics

Abstract

The Japan Electric Power Exchange (JEPX) has introduced a feed-in premium to promote the trading of renewable energy electricity in the market; thus, the exchange has become increasingly important for RE companies to maintain profitability in market trading. However, electricity prices are not only affected by directly measurable factors such as electricity demand, fuel prices, and weather but also by corporate bidding strategies, social conditions, and other human behaviors, making it difficult to predict electricity prices. Given that electricity demand is related to human behavior, this study focuses on web search behavior and clarifies the relationship between keyword search volumes and electricity market prices in Japan. Correlation and vector autoregression analyses results show a moderately strong positive correlation between the logarithmic difference of the keyword search volume and that of the electricity price. In addition, we find that the logarithmic difference of the electricity price tends to increase when that of the keyword search volume on the previous day increases. These results suggest that search volumes of specific keywords can be effective explanatory variables for area price prediction models and can help identify signs of price spikes.

Attention Matters: An Exploration of Relationship Between Google Search Behaviors and Crude Oil Prices

Article 16 October 2019

Selecting sensitive web info via conditional probabilities to model economics and financial variables

Article Open access 13 July 2023

Global energy markets connectedness: evidence from time–frequency domain

Article 13 December 2022

Introduction

In Japan, the Feed-in Tariff (FIT) system was introduced in 2012 to promote renewable energy (RE). This system ensured that electricity generated from renewable sources would be purchased at a fixed price for a certain period. Thanks to the system, RE suppliers could avoid market risks and stabilize their prospects for investment recovery. As a result, by the end of FY 2021, the operated renewable energy generation capacity had increased by approximately 3.3 times compared to before the introduction of the FIT system [1]. However, some issues have also been identified, such as the increased burden on the public associated with purchasing RE electricity and the persistently high cost of generating renewable energy. In response to these issues, in 2022, the Japanese Feed-in-Premium (FIP) system has begun operating to encourage market trading of RE electricity and make RE a competitive power source. When RE power suppliers sell electricity in the market through the FIP system, a certain amount of payment (premium) is given on top of the market price [2]. However, if power suppliers cause a discrepancy between planned and actual generation (imbalance) in market trades, they must incur the cost of compensating for the imbalance as a penalty [3]. In other words, RE suppliers with high uncertainty in power generation are likely to cause imbalances. Therefore, the risk of electricity price spikes should be avoided to prevent profit deterioration.

Regarding Japan’s electricity market, several studies focus on predicting RE generation and spot prices. Nakayama et al. incorporate Geographic Information System (GIS) information into a prediction model for photovoltaic power generation and spot prices and mention that GIS information provides better prediction accuracy [4]. Kaneko et al. apply sparse modeling to select important variables for spot price prediction in the Japanese electricity market and show that the importance of calendar dates is relatively higher than that of other variables [5]. Ohta et al. develop a novel price prediction model based on neural networks and show that it could forecast well except during price spikes [6]. Yamada et al. apply a generalized additive model to develop a spot price forecasting model for the Japanese electricity market and mention that challenges remain in forecasting when prices spike [7]. Adline et al. focus on price spike in JEPX and try to developed a price spike predicting model applying a stochastic process called the Hawks process [8]. Maekawa et al. clarify that speculative actions of electricity suppliers can change inelasticity of the demand in electricity market based on mathematical modeling considering merit order curves [9]. On the other hand, it is also suggested that price spikes in the electricity market are difficult to predict because of the significant influence of human behavior and intention, such as the bidding strategies of market participants [10]. However, these studies focus on directly measurable information, such as weather and fuel prices, as explanatory variables for electricity price prediction, or try to model the electricity market mathematically; the influence of human behavior is not considered in their models. Therefore, their application to predicting electricity prices considering human uncertainty will be limited while they are effective in addressing spot price under economically rational operation and understanding past phenomena.

As mentioned above, RE companies must face the uncertainty of human behavior, in addition to variable electricity generation in the electrical market. However, decision making under uncertainty is generally risk-averse [11, 12], and uncertainty can hinder economically rational decisions by RE companies [13,14,15]. In addition, electricity consumers tend to be risk-averse when they select electricity plans [16]. Given these factors, renewable electricity suppliers are expected to be more reluctant to bid for renewable electricity because of concerns about the uncertainty of price spikes. This could be one factor preventing the promotion of market purchases and sales of renewable electricity. Considering the future increase of JEPX participants and the growth of uncertainty due to human factors in addition to variable RE power generation, incorporating measurements of human behavior into predictive models can be an effective approach. Then, how can we measure human behavior and mitigate the uncertainty of price spikes?

Owing to the development of information technology in recent years, large-scale social data on human behavior and communication have become available through websites and social networking services, and research on the social phenomena using these data is attracting attention [17]. Social media is expected to become a “social telescope” that provides a bird's-eye view of society and human behavior [18,19,20], and a wide range of research fields utilize social data. In the field of market transactions, the application of social data through the Web is relatively well discussed in financial markets [21,22,23]. Ersan et al. have noted that reports are increasing to engage financial data with social media information for the purpose of predicting stock prices [24]. Bollen et al. clarify that daily fluctuations in sentiment, as derived from Twitter data, exhibit a statistically significant correlation with the daily closing prices of the Dow Jones Industrial Average [25]. Of particular note is the attempt to understand market behavior from keyword search volumes provided by Google [26], given that actors begin their decision-making processes by attempting to gather information [27]. Preis et al. provide evidence of correlation between the weekly transaction volumes of S&P 500 companies and the weekly search volumes of the corresponding company names [28]. In addition, changes in the volume of Google keyword searches may identify signs of falling stock prices [29]. These reports suggest that some data in social media show a correlation with financial data, and keyword search volumes can be one of indicators in the financial market.

Considering the insights into the financial market stated above, keyword search volumes can be surrogate indicators of human behavior in electricity markets and can mitigate the uncertainty of price spikes because electricity demand is closely related to human behavior, and social conditions influence corporate strategies (Fig. 1). However, no studies explore the utilization of social data in the electricity market, where participants are fewer than in financial markets and the mechanisms are different. Therefore, this study clarifies the relationship between keyword search volumes and electricity market prices by examining the Japan Electric Power Exchange (JEPX); it employs keyword search volumes obtained from Google Trends as representative social data on human behavior.

This study introduces a novel approach to analyzing electricity market prices using web search data. The methodology developed, particularly the use of time-series clustering and statistical analysis, is not region-specific and can be applied to different markets and geographic locations. In addition, the proposed approach highlights the impact of human behavior, as captured through online searches, on electricity market prices. The knowledge obtained by this study will contribute to understanding markets influenced by consumer behavior and sentiment.

Methodology

Overview of the evaluation procedure

This study focuses on the relationship between keyword search volumes on the Web, which is large-scale social data, and electricity market prices. The section "Conditions for the analysis" defines the conditions for analysis such as target data. The section "Categorization of search volumes by time series clustering" categorizes keyword search volumes based on the similarity of waveforms using a machine learning method to understand the time-series characteristics of the keywords. The sections "Stationarity evaluation of time series data" and "Evaluation of the relationship between area prices and search volumes using statistical methods" examine the relationship between keyword search volumes and electricity market prices using statistical methods.

Conditions for the analysis

(1)
Scope of analysis

This study selects the Tokyo and Kansai regions in Japan, which are the first and second most populous regions in the country, respectively. We analyze the FY2022 area prices of the JEPX spot market in the two regions and search volumes for keywords obtained from Google Trends.

(2)
Overview of the electric utility system and electric power exchange in Japan

The Japanese power grid is characterized by a longitudinal transmission system without international connections and is split into ten regions: Hokkaido, Tohoku, Tokyo, Hokuriku, Chubu, Kansai, Chugoku, Shikoku, Kyushu, and Okinawa. The utility frequency of the first three regions is 50 Hz, while that of the remaining regions is 60 Hz. The transmission system in each area is largely independent because the power interconnection capacity is limited to neighboring areas. However, that of the Okinawa region is completely isolated from the other regions [30].

The JEPX has been established in 2003 as part of the deregulation of Japan’s electrical utility system. Approximately 300 electric companies in all regions except Okinawa have joined JEPX [31]. Because Tokyo and Kansai are the most populated areas at 50 and 60 Hz, respectively, they are considered as case study areas. Figure 2 presents the power supply composition of each region. Approximately 85% of total electricity generation in Tokyo comes from thermal power. This is because, as of 2023, nuclear power plants, including the Fukushima Daiichi power plant damaged in the Tohoku earthquake in 2011, are not permitted to restart in the 50 Hz area [1, 30]. However, in Kansai, which is part of the 60 Hz area, nuclear power plants are being restarted and account for approximately 20% of the total power generation. Regarding RE, more electricity is generated in Tokyo than in Kansai.

As of 2023, JEPX is the only exchange that operates a wholesale electricity trading market, where approximately 40% of the electricity sold in Japan is traded [31]. JEPX operates several electricity markets: spot market, intraday market, forward market, and baseload market. Because approximately 98% of the total trading volume of JEPX is in the spot market, we focus on the spot market [31]. The spot market, also called the “day-ahead market,” is where the electricity to be delivered the next day is traded, and the next day is divided into 48 frames by 30 min. The auction process in the spot market is facilitated by a mechanism called the blind single-price auction. Within the single-price auction methodology framework, all submitted bids are segregated into distinct “sell” and “buy” categories after the bid is closed. These categories are then amalgamated to form comprehensive supply demand curves. The point at which these curves intersect establishes both the contract price and corresponding quantity. The contract price is applied to all successful bidders, regardless of their bidding prices [32].

The contract price, which is determined at the point of intersection between the comprehensive supply and demand curves encompassing all selling and buying bids within the nine regions, is referred to as the system price. However, because there are capacity limitations on the interconnection lines in each region of Japan's power grid, the supply quantity from one area to another corresponding to the system price may exceed the limitations. In this case, a supply–demand curve is formulated by considering bids from sellers and buyers within each region. The contract price, which is derived from the intersection of these bids of each region, is called the area price [32]. Given that Tokyo and Kansai are the focal points of the investigation, we explore the relationship between the area price and search volume within these two regions.

In the context of actual supply and demand on trading days, when deviations between the planned and actual generated power occur, which are called “imbalances,” electricity distribution companies adjust their own regulating capacities to rectify the imbalances. Power generation and retail companies responsible for causing imbalances are required to settle the regulating cost corresponding to these imbalances with electricity distribution companies post facto [3]. In particular, RE companies face a higher risk of imbalanced penalties during periods of escalated area prices owing to uncertainty in their power supply. Therefore, this study focuses on the peak values of area prices for each day in FY2022 [33].

(3)
Google trends

Google Trends provides search volumes for keywords on the Google search engine [36]. Search volumes from Google Trends are not absolute values of the number of searches but relative search volumes [%] with the maximum number of searches in a specified period as 100. Google Trends allows users to specify keywords, periods, and regions from which search volumes must be obtained. This study targets the search volume of “Tokyo Prefecture” and “Osaka Prefecture” as representative districts of Tokyo and Kansai, respectively (see Appendix A for the relevance of target regions to obtain data). Figure 3 shows the time-series trend of the search volume of “electric power” as an example keyword, with the area price also shown. This study hypothesizes that the search volumes of some keywords help capture signs of sudden changes in electricity prices and elucidates the relationship between JEPX area prices and search behavior. To achieve this, search keywords are selected based on the following criteria to evaluate the characteristics of a diverse range of keywords:

1.
Context of energy: 40 keywords of high importance calculated by the TF-IDF method for each chapter and section of the Energy White Paper 2022 (Table 1, left side)
2.
Social interest: The 25 most frequently searched keywords in Google in FY2022 (Table 1, right side).

Table 1 The keyword list for the evaluation chosen from Energy White Paper 2022 and the top search keywords by Google

Full size table

Regarding criterion 1, keywords with high importance in the context of “energy” are expected to include those that are relevant to the area price trend. The TF-IDF method [37,38,39], a quantitative method to evaluate the importance of words in documents, is applied to extracting specific keywords, using each chapter and section of the Energy White Paper 2022 [1] as input. The TF-IDF value is obtained for each keyword in each chapter and section, and the sum of the TF-IDF values is calculated for each keyword. The top 40 keywords with the highest TF-IDF values are designated as the keywords of the Energy White Paper 2022 (Table 1, left side). The keyword list includes energy-related terms such as “energy,” “electricity,” and “nuclear power,” as well as business-related terms such as “business,” “price,” and “demand”. We refer to these keywords as “Group A.”

Regarding criterion 2, the trend of area prices might correlate not only with keywords related to energy or electricity but also with keywords linked to events of societal interest. The 25 keywords with the highest search frequency on Google in FY 2022 are selected. Table 1 (right side) presents the various keywords selected: daily searched keywords such as “weather” and “news;” web services such as “YouTube” “Twitter,” and “Amazon;” “COVID-19;” “Corona;” keywords reflecting the current times; manga and anime titles; and others that attract attention in the world. We call these keywords “Group B.”

Categorization of search volumes by time series clustering

In the section "Conditions for the analysis", 65 keywords are assessed for their relevance to JEPX area prices. Although this study represents a novel endeavor to elucidate the connection between JEPX area prices and keyword search volumes, the sheer volume of time-series data poses a challenge for characterization. Consequently, we extract overarching patterns from this extensive time-series data through the utilization of “time-series clustering,” a form of unsupervised machine learning.

TimeSeriesKMeans employs the k-means algorithm [40, 41] to analyze time-series data. The k-means algorithm, a commonly used clustering technique, functions by grouping data in a manner that minimizes the cumulative distance between the center point of each cluster and the data points within that cluster [42]. The simplicity of this algorithm makes it appropriate for clustering large amounts of data. Notably, TimeSeriesKMeans diverges from the conventional k-means methodology by adopting Dynamic Time Warping as its distance metric instead of the Euclidean distance [41, 43]. This choice allows TimeSeriesKMeans to effectively measure the similarity between time-series data, thereby facilitating the clustering process. This study segments the dataset into five distinct clusters based on the results obtained from the analysis of cluster numbers using the elbow method. Appendix B presents the results of the elbow method.

Stationarity evaluation of time series data

In general, the statistical analysis of time-series data assumes that the process is stationary. Process stationarity signifies that both the expected value and variance of data remain constant across time intervals. This stationarity implies that such processes exhibit an absence of trends and tend to revert to mean values over extended periods. Nevertheless, a considerable portion of time-series data within the domains of economics and finance deviate from the ideal stationary process. Therefore, nonstationary data are typically converted into stationary data for analysis or modeling. Given the current context in which the understanding of the time-series characteristics of JEPX area prices and keyword search volumes remains limited, assessing the stationarity of these datasets becomes important. This evaluation is essential to ensure the appropriate assessment of the relationship between area prices and search volumes.

Time-series data in economics and finance such as stock prices are often non-stationary in the original series but stationary in the difference series. Additionally, in the assessment of the profitability of economic and financial data, the logarithmic difference series is often adopted as a measure of the rate of return. The process of taking the logarithm of the original series and obtaining the difference between the logarithms of consecutive data points is referred to as a logarithmic difference series. Therefore, in this study, stationarity is evaluated for both the original series and the logarithmic difference series pertaining to area prices and keyword-specific search volumes. This approach aligns with typical practices in analyzing time-series data in economics and finance.

The augmented Dickey-Fuller (ADF) test, a method commonly employed for testing time-series data stationarity, is used to assess the stationarity of the area prices and keyword search volumes. The significance of the ADF test determines whether the null hypothesis is rejected, indicating that the time-series data are statistically stationary and do not include a unit-root process. Notably, the ADF test necessitates selecting of an appropriate model for both null and alternative hypotheses, considering the characteristics of the data. Consequently, the critical value for the statistical hypothesis test varies based on the selected model. Given the area prices and keyword search volumes analyzed in this study, three types of models are adopted: (1) without a constant term, (2) with a constant term, and (3) with a constant term and a first-order trend.

Evaluation of the relationship between area prices and search volumes using statistical methods

Considering the outcomes of the stationarity test of the time-series data in the previous section, a statistical assessment of the relationship between area price and keyword search volume is conducted. This study examines the relationship between keyword search volumes and area prices to identify keywords that can predict price spikes in the spot market (day-ahead market). To achieve this goal, it is important to evaluate from the perspective of keywords that (1) correlate with the area prices for the day and (2) change in search volumes for the previous day before the change in area price for the day.

For (1), the Pearson correlation coefficient between the area price for the day and the search volume for each keyword is calculated to confirm the correlation. For (2), Vector Autoregressive (VAR) models are developed for area prices for the day. This model incorporates the area price and one of the keyword search volumes for the previous day as explanatory variables, as shown in Function (1). The VAR model is an extension of the autoregressive (AR) model that encompasses multiple explanatory variables with different time lags. Nevertheless, since this study specifically examines the change in area price for the day and in search volume for the previous day, VAR models comprising two variables with a single lag (t-1) are developed: the area price and one of the keyword search volumes with one day lag (i.e., the previous day). The statistical significance of the regression coefficients obtained from the VAR model is evaluated to demonstrate the relationship between electricity prices for the day and keyword search volumes for the previous day.

$$Y\left(t\right)={\beta }_{0}+{\beta }_{1}Y\left(t-1\right)+{\beta }_{2}{X}_{n}\left(t-1\right)+\epsilon$$

(1)

where Y is the area price, X_n is the search volume for each keyword, n is each keyword (n = 0, …, 64), t is the day, β₀, β_1, and β₂ are the regressive coefficients, and ε is the error term with mean 0 and variance constant.

Results and discussion

Keyword categorization by time series clustering

Figure 4 presents the clustering results using TimeSeriesKMeans, and Table 2 summarizes the corresponding clusters for each keyword. Figure 4 comprises five graphs, designated as clusters 0 to 4, from top to bottom. In each graph, the thick colored line represents the trend of the cluster center, and the thinner gray lines represent the search volume trends associated with keywords classified within that specific cluster. As Table 2 shows, 14 of the 40 keywords in Group A are classified into different clusters in Tokyo and Kansai, whereas only 2 of the 25 keywords in Group B belonged to different clusters. Therefore, Group A, which represents the context of energy, contains a relatively large number of keywords with different types of waveforms for each region. However, the waveform of Group B, which represents social interest, shows similarity by region.

Table 2 Results of time-series clustering by keywords, keywords without “x” share the same cluster both in Tokyo (T) and Kansai (K) regions

Full size table

Figure 5 shows the standard deviation of the search volume for each keyword on the horizontal axis, and the mean value on the vertical axis. The color of each data point corresponds to the cluster to which it belongs. Since TimeSeriesKMeans clusters similar time series trends based on the distance between the data, the standard deviation and mean values of the time-series data do not necessarily directly explain the characteristics of each cluster. However, Fig. 5 indicates that the mean and standard deviation of each keyword, along with the clustering results, contribute to a comprehensive understanding of the time-series attributes of each keyword. The characteristics of each cluster are as follows.

Cluster 0

Median of mean: 32.2 (2nd lowest); median of standard deviation: 20.5 (1st highest)

As the waveforms of this cluster show, daily search volumes are relatively low; however, interest in keywords frequently increases and decreases over large ranges.

Keywords (Groups A, B): Tokyo (6, 0), Kansai (11, 0)

All keywords in this cluster belong to Group A indicating shared themes related to energy. Keywords such as “nuclear power,” “supply,” “implementation,” “crude oil,” “energy saving,” and “department” are common to both Tokyo and Kansai. Nevertheless, Kansai has more Group A keywords, including “fuel,” classified within this cluster compared to Tokyo.

Cluster 1

Median of mean: 56.4 (2nd highest); median of standard deviation: 16.9 (2nd highest)

Keywords belonging to this cluster maintain consistent popularity, but also have a relatively large range of increases and decreases depending on the interest in the keyword.

Keywords (Groups A, B): Tokyo (17, 2), Kansai (10, 1)

All keywords in this cluster belong to Group A, except for “Google” and “translation” in Tokyo and “translation” in Kansai from Group B. Tokyo and Kansai share keywords such as “energy,” “business,” “technology,” and “economy.” The keywords represent the most energy context for Tokyo in particular.

Cluster 2

Median of mean: 41.7 (middle); median of standard deviation: 9.65 (2nd lowest)

Keywords belonging to this cluster are widely distributed in mean values, but show moderate popularity and relatively stable volatility overall. However, waveform spikes are occasionally observed.

Keywords (Groups A, B): Tokyo (6, 9), Kansai (9, 10)

Tokyo and Kansai share several keywords, such as “power generation” and “electric power” from Group A, and “weather” and “Yahoo” from Group B, but relatively more Group B keywords are assigned.

Cluster 3

Median of mean: 20.25 (1st lowest); median of standard deviation: 12.2 (middle)

Interest in keywords in this cluster is relatively low; however, spike-shaped and characteristic mountain-shaped waveforms are occasionally observed.

Keywords (Groups A and B): Tokyo (6, 3), Kansai (4, 3)

Though Tokyo and Kansai share several keywords, this cluster is characterized by keywords such as “COVID-19,” “Pokémon,” and “ONE PIECE” from Group B, whose search volumes increase when some events which many people attention to increase in infected patients and release of movies.

Cluster 4

Median of mean: 62.5 (1st highest); median of standard deviation: 7.5 (1st lowest)

The keywords in this cluster maintain high popularity, and the validation is relatively smaller than that in other clusters.

Keywords (Group A, Group B): Tokyo (5, 11), Kansai (6, 11)

This cluster includes the highest count of Group B keywords and is presumed to be the cluster that best capture daily search behavior. For instance, “YouTube,” “Twitter,” “Instagram,” and “sports” are assigned in both Tokyo and Kansai.

Evaluation of stationarity of time series data

Table 3 shows the results of the ADF test for area prices and search volumes of the original series (Table 3b) and the logarithmic difference series (Table 3c) for Tokyo. The results of time-series clustering, which correspond to Table 2, are also shown in Table 3a for reference. Table 4 presents the results for Kansai. With respect to the original series of area prices, the results for Tokyo are significant at the 5% level for (1) and significant at the 1% level for (2) and (3), indicating that they may be stationary. However, for Kansai, the statistical significance of the ADF test could not be confirmed and nonstationarity is not rejected. These findings indicate the presence of regional differences in the stationarity of the original series. Conversely, in the logarithmic difference series for both Tokyo and Kansai, the results are significant at the 1% level for (1), (2), and (3), indicating the likelihood of the stationary process.

Table 3 Results of ADF test for Tokyo Region

Full size table

Table 4 Results of ADF test for Kansai Region

Full size table

As for the original series of the keyword search volumes, the test statistic of (1) for all keywords is non-stationary except “deal” and “coal” in the original series for both Tokyo and Kansai. For (2) and (3), a mix of keywords exhibit statistical stationarity and nonstationarity. Although no clear distinction based on clusters or regions is observed, fewer keywords belonging to Group B and clusters 2, 3, and 4 statistically deny nonstationarity. For the logarithmic difference series, except for “COVID-19,” the results indicate that null hypotheses for (1), (2), and (3) are rejected at the 1% level in both Tokyo and Kansai, suggesting a high likelihood of statistical stationarity. Therefore, some keywords may be stationary even in the original series in terms of search volume; however, it is difficult to clearly deny non-stationarity in light of the results in (1).

Based on the above results, the logarithmic difference series of area prices and search volumes is applied to the analysis in the next section. This allows for a more rigorous statistical evaluation and direct comparison between area prices and keyword search volumes across regions.

Evaluation of the relationship between area prices and search volumes using statistical methods

Table 5 presents the analysis results of the relationship between area prices and search volumes. Table 5b shows the Pearson correlation coefficient (R) of search volumes with the area price in each region. L(0) indicates that the number of lags is zero, that is, the area price and search volume on the same day. In Table 5, keywords with a correlation coefficient of 0.4 or higher in either Tokyo or Kansai are extracted. Appendix C shows the outcomes for all keywords. Table 5c shows the statistical estimates based on the VAR(1) models, which predict the area price on the day (L0) from the area price and the search volume for one of the keywords on the previous day (L1). A logarithmic difference series is employed for the evaluation. The time-series clustering results corresponding to Table 2 are also shown in Table 5a for reference purposes.

Table 5 Statistical analysis results with Pearson correlation coefficient and VAR mode

Full size table

When a correlation coefficient of 0.4 or higher is used as the criterion for a relatively high positive correlation between area price and search volume, applicable keywords are “business,” “electric power and,” “industry” from Group A, and “Google,” “Yahoo,” “translation” from Group B common for Tokyo and Kansai, and “technology,” “development” and “plan” from Group A only for Tokyo are applicable. As for the assigned clusters, for Tokyo, seven out of the nine keywords are found in Cluster 1 and one each in Clusters 2 and 3. For Kansai, three of the six keywords come from Clusters 1, two from Cluster 2, and one from Cluster 4. Keywords that are highly correlated with area prices are found most frequently in Cluster 1.

Macroscopically, with regard to the VAR model analysis, only 12 models for Tokyo confirm the 1% or 5% significance of the regression coefficients for area price (L1). Thus, t the regression coefficient of area price (L1) is undeniably zero in many models. However, in Kansai, the significance of the regression coefficients for area price (L1) is confirmed for all models. Consequently, this result implies that the area price on the day in Tokyo is less influenced by the previous day's price than in Kansai (see also Table 7 in Appendix C).

As for search volume (L1), when the models with regression coefficients of 1% and 5% significance are extracted, there are 30 keywords (Group A: 21, Group B 9) for Tokyo and 35 keywords (Group A: 20, Group B 15) for Kansai, and 24 keywords (Group A:15, Group B 9) are common for both regions. The significance of the regression coefficient is confirmed for the keywords that show relatively high positive correlations with area prices, such as “business” and “google,” except for “electric power” in Tokyo. Comparing Groups A and B, the regression coefficients tend to be greater in Group B (see also Table 7 in Appendix C). Although no clear regional differences are identified, the regression coefficients are generally higher for Kansai.

These results suggest that the logarithmic difference series of the following keywords are partially linked to the area price on the day, and changes in the area price for the day may be captured from changes in the search volume for the previous day:

Tokyo Region

Cluster 1 – Group A: “business,” “technology,” “development,” “industry,” “plan”

– Group B: “Google”, “translation”.
Cluster 2 – Group B: “Yahoo”

Kansai Region

Cluster 1 – Group A: “business,” “industry”

– Group B: “translation”
Cluster 2 – Group A: “electric power”

– Group B: “Yahoo”
Cluster 4 – Group B: “Google”

Examining Fig. 5, I observe that they are plotted in the middle or higher mean and standard deviation of search volume. When Groups A and B are compared, the keywords in Group A have higher standard deviations. Although there are some differences in the words extracted from Group A between Tokyo and Kansai, the search volumes of these keywords may move relatively significantly before the area price changes. In Group B, the common keywords "Google," "translation,” and "Yahoo" are extracted from Tokyo and Kansai. These are popular keywords associated with general Web search behavior with high absolute search volumes.

Figures 6 and 7 show orthogonalized impulse response analysis of each area price in case a one standard deviation shock in each keyword search volume is given to the VAR models. The analysis helps us understand the influence of keyword search volumes on area price by isolating the error terms associated with each variable within the model. Figure 6 is for the Tokyo region, and Fig. 7 is for the Kansai region. The dotted lines in each graph indicate a confidence interval of 95%. Since this study assumes the use of a keyword search volume as a one-day leading indicator of each area price, the magnitude corresponding to one on the horizontal axis of each graph explains the lag effect of the keyword search volume on the area price. As Fig. 6b shows, the lower dotted line is under zero even though the upper dotted line is above zero. This means search volumes of “electric power” in the Tokyo region may not be a leading indicator of the area price by the confidence interval of 95%. On the other hand, the one-day lag effect is observed in search volumes of the keywords in Figs. 6 and 7 other than “electric power” in the Tokyo region from orthogonalized impulse response analysis point of view. Moreover, the lag effect in the Kansai region is greater than that in the Tokyo region. These results implies that human search behavior is linked to area prices to some extent, and the observation of these keywords may help detect sudden changes in area prices.

Conclusion

This study highlights the lack of sufficient discussions on whether social data through the Web can help mitigate the uncertainties of human behavior’s influence on the electricity market, even though the uncertainties in the market may discourage RE companies’ proactive electricity deals. This study hypothesizes that keyword search volumes, which are social data taken from the Web, might help identify signs of changes in electricity prices, and clarifies the relationship between keyword search volumes and electricity market prices. Keyword search volumes are categorized by the time-series clustering method. The relationship is evaluated by statistical methods, and the Tokyo and Kansai regions are the focus of this study.

This study draws several conclusions. First, new insights into the stationarity of the time-series data on area prices and keyword search volumes are obtained. Regarding area prices, regional differences are observed; the area prices of Tokyo are possibly stationary in both the original series and logarithmic difference series. However, for Kansai, the original series is likely to be nonstationary, whereas the logarithmic difference series is likely to be stationary. Although some previous analyses in other countries suggest that electricity prices are non-stationary in the original series [44, 45] and become stationary by taking the difference series [45], no studies explore the difference in regions. Here, looking at the composition of power sources in Japan, thermal power generation accounts for a large proportion of power generation in the Tokyo region, with natural gas in particular accounting for more than 50% of the total [34, 46]. On the other hand, nuclear power generation is relatively larger in the Kansai region, and natural gas and coal are almost equally represented in thermal power generation [35, 47]. The Kansai region has more varied power sources, which means that the cost of power generation changes more sensitively with changes in demand. This may bring the differences of stationarity in the original series of area prices though future investigations are needed. The result that there are regional differences in the stationarity of electricity prices implies that the composition of power sources needs to be considered when constructing the electricity price forecasting models. As for search volumes, although some differences by keyword are observed, no clear regional differences are confirmed, and the possibility of non-stationarity for the original series could not be clearly rejected. Meanwhile, the logarithmic difference series is likely to be stationary. Although the stationarity of keyword search volumes has not been fully discussed in previous studies, the fact that it can be made stationary by taking logarithmic difference series is an important finding for building statistical forecasting models in the future using keyword search volume as an explanatory variable. Considering above, the logarithmic difference series is preferable for a statistically reliable time-series data analysis on area prices and search volumes, including economic indicators such as stock prices.

Second, the correlation of area prices on the day with search volumes on the day is evaluated using the Pearson correlation coefficient of the logarithmic difference series. In addition, the regression analysis with the VAR model is examined to explain the changes in area prices for the day by those in keyword search volumes for the previous day. We find that the area prices moved, linking search volumes of some keywords and whose changes could be captured by the search volumes of the keywords on the previous day. Specifically, "business," "technology," "development," "industry," and "plan" for Tokyo and "business", "industry," and "electric power" for Kansai are extracted from the keywords used in the context of energy. As for keywords showing social interest (Group B), "Google," "translation," and "Yahoo" are common for both regions; those are popular keywords associated with general web search behavior that have high absolute search volumes. Moreover, the mean and standard deviation of the search volumes for these keywords are above moderate, and these keywords are categorized into specific clusters based on the similarity of the waveform point of view. Although similar keywords are extracted from both Tokyo and Kansai, the regression coefficients tend to be larger for Group B than for Group A and larger for Kansai than for Tokyo. As mentioned in the introduction, previous studies in the stock market have suggested that changes in the volume of some keyword searches correlate with stock trading and identify early warning signs of stock [26, 28, 29]. It is thought-provoking that the results of this study are consistent with the previous analysis even though the JEPX has fewer participants and the different trading rules compared to the stock markets. Investigating the background events and social conditions that link the search volume of these keywords with a day-ahead lag effect could provide a better understanding of the potential impact of public interest and sentiment on electricity market prices.

These results suggest that the logarithmic difference series of specific keywords used in energy contexts such as “business” and “industry” and popular words in daily search behavior such as "Google," "translation," and "Yahoo" can be effective as explanatory variables for area price prediction models and help capture the signs of area price spikes. Despite the significant and valuable results of this study, the following conditions are worth mentioning for future studies.

1.
This study applies statistical methods to investigate the relationship between keyword search volumes and area prices and shows that the area prices for that day “tend to change when” search volumes of some keywords for the previous day have changed. While this result is statistically appropriate, this does not show a causal relationship. It does not necessarily indicate that the area prices on the day changes “because” search volumes of some keywords in the previous day have changed.
2.
This study focuses on Tokyo and Kansai in FY 2022 as a case study because they have the largest web search volumes and electricity deals. However, future studies can examine the differences between other regions over multiple years to obtain more general insights. The evaluation procedure and findings of this study are helpful for future investigations.

Considering the outcomes of this study and the conditions mentioned above, we expect to develop a price spike prediction model using search volumes to mitigate the uncertainties caused by human behavior in the market as future work.

Lastly, the findings of this study offer several policy implications, particularly in shaping decision-making processes within the electricity market, although the development of the prediction model for price spikes in the electricity market is future work. Firstly, the correlation between online search behavior and electricity market prices can be a valuable tool for policymakers and market regulators. Understanding the potential influence of public interest and sentiment, as reflected in web searches, on market prices can aid in developing more responsive and adaptive market strategies. Secondly, regulators can use these insights to anticipate market fluctuations. Incorporating web search data into predictive models makes it possible to foresee and mitigate abrupt price spikes, thus ensuring market stability and consumer protection. Thirdly, observation of regional differences in the correlation between search volumes and market prices can suggest region-specific policy approaches. While this study focuses on Tokyo and Kansai, the methodology is adaptable to other regions in Japan and globally. The unique market dynamics and consumer behavior patterns in each region or country can be analyzed similarly to derive region-specific insights. Policymakers may need to consider these regional nuances when designing regulations and interventions in the electricity market. In summary, integrating web search data analysis in the electricity market opens new avenues for informed decision-making and policy development. Future research in this area can further refine these approaches, offering more nuanced insights into the interplay between public sentiment and market dynamics.

Data availability

The dataset on the keyword search volumes is available at https://trends.google.co.jp/trends. The electricity market price at JEPX is available at https://www.jepx.jp/en/electricpower/market-data/spot/.

References

Ministry of Economy Trade and Industry. (2022). “Energy White Paper 2022” [Online]. Available: https://www.enecho.meti.go.jp/about/whitepaper/2023/html/
Agency for Natural Resources and Energy. (2022). “FIT and FIP guidebook for renewable energy”. [Online]. Available: https://www.enecho.meti.go.jp/category/saving_and_new/saiene/data/kaitori/2022_fit_fip_guidebook.pdf
Ministry of Economy Trade and Industry. (2022). “Report for Imbalance fee system.” [Online]. Available: https://www.emsc.meti.go.jp/info/public/pdf/20220117001b.pdf
Nakayama, S., Shiota, A., Mitani, Y., & Watanabe, M. (2023). JEPX spot prices forecasting system using GIS. Energy Reports, 9, 329–336. https://doi.org/10.1016/j.egyr.2022.10.419
Article Google Scholar
Kaneko, N., & Inoue, T. (2023). Sparse modeling approach for day-ahead electricity price forecast and factor analysis in the Japanese spot market. Journal of Japan Society of Energy and Resources, 44(4), 160–170.
Google Scholar
Ohta, Y., Tani, Y., Sugimoto, J., Yokoyama, R. (2006). Novel Price Prediction by Using Neural Network Under Large Volatility in Electric Power Exchange. In Proc. 6th WSEAS Int. Conf. Power Syst. Lisbon, Port., pp. 143–148.
Yamada, Y., Makimoto, N., & Ryuta, T. (2015). JEPX spot price prediction and bid volume-price function estimation using a generalized additive model. Jafee Journal, 14, 8–39.
Google Scholar
Adline, B., & Ikeda, K. (2023). A Hawkes model approach to modeling price spikes in the Japanese electricity market. Energies, 16(4), 1570. https://doi.org/10.3390/en16041570
Article Google Scholar
Maekawa, J., & Shimada, K. (2019). A speculative trading model for the electricity market: Based on Japan electric power exchange. Energies, 12(15), 1–15. https://doi.org/10.3390/en12152946
Article Google Scholar
Ogimoto, K., Iwafune, Y., Urabe, C., Azuma, H., & Isonaga, A. (2021). Analysis of a spot energy market price using power production simulation model. Journal of Japan Society of Energy and Resources, 42(4), 185–193.
Google Scholar
Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometlica, 47, 263–291.
Article Google Scholar
Tversky, A., & Kahneman, D. (1992). Advances in prospect theory: Cumulative representation of uncertainty. Journal of Risk and Uncertainty, 5, 297–323.
Article Google Scholar
Klein, M., & Deissenroth, M. (2017). When do households invest in solar photovoltaics? An application of prospect theory. Energy Policy, 109(July), 270–278. https://doi.org/10.1016/j.enpol.2017.06.067
Article Google Scholar
Salm, S., Hille, S. L., & Wüstenhagen, R. (2016). What are retail investors ’ risk-return preferences towards renewable energy projects ? A choice experiment in Germany. Energy Policy, 97, 310–320. https://doi.org/10.1016/j.enpol.2016.07.042
Article Google Scholar
Gotoh, R., Tezuka, T., & McLellan, B. C. (2022). Study on behavioral decision making by power generation companies regarding energy transitions under uncertainty. Energies, 15(2), 1–29. https://doi.org/10.3390/en15020654
Article Google Scholar
Nicolson, M., Huebner, G., & Shipworth, D. (2017). Are consumers willing to switch to smart time of use electricity tariffs? The importance of loss-aversion and electric vehicle ownership. Energy Research & Social Science, 23(2017), 82–96. https://doi.org/10.1016/j.erss.2016.12.001
Article Google Scholar
Lazer, D. M. J., et al. (2020). Computational social science: Obstacles and opportunities. Science, 369(6507), 1060–1062. https://doi.org/10.1126/science.aaz8170
Article Google Scholar
Golder, S. A., & Macy, M. W. (2014). Digital footprints: Opportunities and challenges for online social research. Annual Review of Sociology, 40, 129–152. https://doi.org/10.1146/annurev-soc-071913-043145
Article Google Scholar
Mejova, Y., Weber, I., & Macy, M. (2015). Twitter: A digital socioscope. Cambridge University Press. https://doi.org/10.1017/CBO9781316182635
Book Google Scholar
Kazutoshi, S. (2021). What’s computational social science. Introduction to computational social science (p. 9). Maruzen Publishing.
Google Scholar
Zhang, X., Fuehres, H., & Gloor, P. A. (2011). Predicting Stock market indicators through Twitter ‘I hope it is not as bad as I fear.’ Procedia-Social and Behavioral Sciences, 26(2007), 55–62. https://doi.org/10.1016/j.sbspro.2011.10.562
Article Google Scholar
Nofer, M., & Hinz, O. (2015). Using Twitter to predict the stock market: where is the mood effect? Business & Information Systems Engineering, 57(4), 229–242. https://doi.org/10.1007/s12599-015-0390-4
Article Google Scholar
Souma, W., Vodenska, I., & Aoyama, H. (2019). Enhanced news sentiment analysis using deep learning methods. Journal of Computational Social Science, 2(1), 33–46. https://doi.org/10.1007/s42001-019-00035-x
Article Google Scholar
Ersan, D., Nishioka, C., & Scherp, A. (2020). Comparison of machine learning methods for financial time series forecasting at the examples of over 10 years of daily and hourly data of DAX 30 and S&P 500. Journal of Computational Social Science., 3(1), 103–133. https://doi.org/10.1007/s42001-019-00057-5
Article Google Scholar
Bollen, J., Mao, H., & Zeng, X. (2011). Twitter mood predicts the stock market. Journal of Computer Science, 2(1), 1–8. https://doi.org/10.1016/j.jocs.2010.12.007
Article Google Scholar
Rao, T., & Srivastrava, S. (2013). Modeling movements in oil, gold, forex and market indices using search volume index and Twitter sentiments. In: WebSci ’13 Proc. 5th Annu. ACM Web Sci. Conf., pp. 336–345. https://doi.org/10.1145/2464464.2464521
Simon, H. (1955). A behavioral model of rational choice. The Quarterly Journal of Economics, 69(1), 99–118.
Article Google Scholar
Preis, T., Reith, D., & Stanley, H. E. (2010). Complex dynamics of our economic life on different scales: Insights from search engine query data. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 368(1933), 5707–5719. https://doi.org/10.1098/rsta.2010.0284
Article Google Scholar
Preis, T., Moat, H. S., & Eugene-Stanley, H. (2013). Quantifying trading behavior in financial markets using google trends. Scientific Reports, 3, 1–6. https://doi.org/10.1038/srep01684
Article Google Scholar
The Federation of Electric Power Companies of Japan. (2023). Electricity Review Japan 2023. [Online]. Available: https://www.fepc.or.jp/english/library/electricity_eview_japan/__icsFiles/afieldfile/2023/04/06/electricity_2023.pdf
Japan Electric Power Exchange. Introduction of JEPX. https://www.jepx.jp/en/. Accessed 5 Sep 2023.
Japan Electric Power Exchange. (2019). Japan Electric Power eXchange Guide [Online]. Available: https://www.jepx.jp/electricpower/outline/pdf/Guide_2.00.pdf?timestamp=1696303835141
Japan Electric Power Exchange. JEPX Spot Price Historical Records. https://www.jepx.jp/electricpower/market-data/spot/. Accessed 15 Jun 2023
TEPCO Power Grid. Electricity Supply Demand Historical Data. https://www.tepco.co.jp/forecast/html/area_data-j.html. Accessed 15 Jul 2023
Kansai Transmission and Distribution. Electricity Supply Demand Historical Data. https://www.kansai-td.co.jp/denkiyoho/area-performance.html. Accessed 15 Jul 2023.
Google. Google Trends. https://trends.google.co.jp/trends/. Accessed 11 Jun 2023
Luhn, H. P. (1957). A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development, 1(4), 309–317. https://doi.org/10.1147/rd.14.0309
Article Google Scholar
Jones, K. S. (1972). A Statistical Interpretation of term specificity and its application in retrieval. Journal of Documentation, 28, 11–21.
Article Google Scholar
Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24(5), 513–523. https://doi.org/10.1163/187631286X00251
Article Google Scholar
Tavenard, R. tslearn.clustering.TimeSeriesKMeans. https://tslearn.readthedocs.io/en/stable/gen_modules/clustering/tslearn.clustering.TimeSeriesKMeans.html. Accessed 15 Jun 2023
Aghabozorgi, S., Seyed Shirkhorshidi, A., & Ying Wah, T. (2015). Time-series clustering: A decade review. Information Systems, 53, 16–38. https://doi.org/10.1016/j.is.2015.04.007
Article Google Scholar
Arthur, D., & Vassilvitskii, S. (2007). k-means++: the advantages of careful seeding. In: Proc. eighteenth Annu. ACM-SIAM Symp. Discret. algorithms, Soc. Ind. Appl. Math. Philadelphia, pp. 1027–1035.
Berndt, D.J., & Clifford, J. (1994). Using Dynamic Time Warping to Find Patterns in Time Series. In: AAAIWS’94 Proc. 3rd Int. Conf. Knowl. Discov. Data Min., pp. 359–370.
Ferreira, Â. P., Ramos, J. G., & Fernandes, P. O. (2019). A linear regression pattern for electricity price forecasting in the Iberian electricity market. Revista Facultad de Ingeniería Universidad de Antioquia, 93, 117–127. https://doi.org/10.17533/udea.redin.20190522
Article Google Scholar
Tabatabaei, T. S., & Asef, P. (2021). Evaluation of energy price liberalization in electricity industry: A data-driven study on energy economics. Energies, 14(22), 1–19. https://doi.org/10.3390/en14227511
Article Google Scholar
TEPCO. Power supply mix and non-fossil certificate usage status. https://www.tepco.co.jp/ep/power_supply/index-j.html. Accessed 24 Mar 2023.
KEPCO. Power supply composition and CO2 emission factor. https://kepco.jp/ryokin/power_supply/. Accessed 24 Mar 2023.

Download references

Acknowledgements

The author would like to thank Editage (www.editage.jp) for English language editing.

Funding

Not applicable.

Author information

Authors and Affiliations

Faculty of Economics, Shiga University, 1-1-1, Banbacho, Hikone, Shiga, 522-8522, Japan
Ryosuke Gotoh

Authors

Ryosuke Gotoh
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The author confirms sole responsibility for the following: study conception and design, data collection, analysis and interpretation of results, and manuscript preparation.

Corresponding author

Correspondence to Ryosuke Gotoh.

Ethics declarations

Conflict of interest

The author declares that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Relevance of target regions to obtain data from Google Trends

This study targets the search volume of “Tokyo Prefecture” and “Osaka Prefecture” as representative districts of Tokyo and Kansai, respectively. Ideally, it is preferable that the target area for spot prices (area prices) and keyword search volumes match. However, it is difficult to match them perfectly; the target areas for area prices are defined in terms of electricity distribution areas such as Tokyo Electric Power Company and Kansai Electric Power Company, while keyword search volumes can only be obtained as relative values on a prefecture-by-prefecture basis. As Table 6 shows that the very high positive correlations are confirmed to search volumes of representative keywords in the Tokyo region with Tokyo prefecture and in the Kansai region with Osaka prefecture. Therefore, this study expects that Tokyo and Osaka prefectures represent the Tokyo and the Kansai regions. The issue of the area consistency from a data acquisition point of view will be future work.

Table 6 Correlation coefficient of keyword search volumes in each region

Full size table

Appendix B: Evaluation of the number of clusters using the elbow method

This study uses the TimeSeriesKMeans method to cluster the keywords to be evaluated. The time-series data of search volume is used as the feature value. Figure 8 shows a graph, with the number of clusters on the horizontal axis and the sum of squares of intracluster errors on the vertical axis. In the Elbow method, the number of clusters is estimated as appropriate when the slope of the sum of squared intracluster errors with respect to the change in the number of clusters begins to increase. In Fig. 8, the slope of the graph starts to become gentler when the number of clusters is approximately 5. Therefore, the number of clusters in this study is set to 5. However, selecting a certain number of clusters involves subjectivity.

Appendix C: Statistical analysis results with Pearson correlation coefficient and VAR mode for all keywords

Table 5 shows only keywords with a correlation coefficient of 0.4 or higher in either the Tokyo or Kansai region. The results of all keywords are shown in Table 7.

Table 7 Statistical analysis results with Pearson correlation coefficient and VAR mode

Full size table

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Gotoh, R. Analyzing the influence of web search behavior on electricity market price: a case study of Japan electric power exchange. J Comput Soc Sc (2024). https://doi.org/10.1007/s42001-024-00259-6

Download citation

Received: 09 November 2023
Accepted: 14 February 2024
Published: 03 April 2024
DOI: https://doi.org/10.1007/s42001-024-00259-6

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Analyzing the influence of web search behavior on electricity market price: a case study of Japan electric power exchange

Abstract

Similar content being viewed by others

Attention Matters: An Exploration of Relationship Between Google Search Behaviors and Crude Oil Prices

Selecting sensitive web info via conditional probabilities to model economics and financial variables

Global energy markets connectedness: evidence from time–frequency domain

Introduction