On the construction of Chinese stock market investor sentiment index

Measuring investor sentiment has become one of the most widely examined areas in behavioral finance. For the purpose of measuring investor sentiment more accurately, we classify the investor sentiment proxy into six market dimensions for the first time, and construct the investor sentiment monitoring index system. By using PCA method and excluding macroeconomic factors, we construct comprehensive investor sentiment index in Chinese stock market. Our results show this index as a good prediction ability to stock market. Subjects: Psychological Science; Testing, Measurement and Assessment; Economic Psychology


Introduction
Investor sentiment is an important cornerstone of behavioral finance, which is widely concerned by academic circles. Baker and Wurgler (2007) divided investor sentiment into bottom-up and topdown approaches. The bottom-up method is based on the individual investor psychology to explain that the overreaction and under reaction to the fundamental value and returns of the past, such as the use of overconfidence, representative bias, and conservatism. The top-down method is that when investor sentiment is in a limited arbitrage situation, studying and analyzing investor sentiment index study, investor sentiment on stock market returns and stock returns, the impact of investor sentiment on asset pricing and the influence of investor sentiment on corporate financial impact study from a macro perspective.
The measurement of investor sentiment has been one of the key and difficult points in behavioral finance. Most of the previous studies have theoretically demonstrated the impact of investor sentiment on the stock price, but they yet didn't form a unified indicator to measure the investor

PUBLIC INTEREST STATEMENT
Investor sentiment is an important research direction of behavioral finance. The question is no longer whether investor sentiment affects stock prices, but rather how to measure investor sentiment and quantify its effects. Several empirical studies have attempted to measure investor sentiment. This paper selected nine corresponding market sentiment proxy indicators from six dimensions for investor sentiment monitoring index system, and construct comprehensive investor sentiment index in Chinese stock market. It shows this investor sentiment index has better prediction function on Shanghai Composite Index. sentiment. In the past, the domestic scholars usually imitated to use the proxy indicators selected by foreign scholars when constructing investor sentiment index, and in all circumstances they will use the same proxy variables. For example, Ning (2009) selected the closed-end fund discount, trading volume, and the number of listed IPO the initial return of consumer confidence and new investor account number to build investor sentiment comprehensive index of China stock market (CICSI). We believe that there are no universal investor sentiment proxy variables and it is almost impossible to ensure that the proxy variables of investor sentiment can be effective in all market environments, the selected proxy variables need to be determined according to the actual situation.
The rest of the paper is organized as follows: Section 2 presents the literature review. Section 3 presents the data of investor sentiment proxy indicators. Section 4 constructs the comprehensive index of investor sentiment. Section 5 concludes.

Literature review
The proxy indicators of investor sentiment are usually divided into two categories: single sentiment indicators and compound sentiment indicators, and the single sentiment indicators can be divided into direct and indirect sentiment indicators.
Direct sentiment indicators are those that are directly related to the pessimistic or optimistic view of the market as a result of the survey. These representative indicators include the US investor intelligence index (II index), the American Association of individual investors index (AAII index), Wall Street analyst sentiment index.
Indirect sentiment indicators refer to a variety of proxy variables that reflect the market sentiment by statistical market transaction data. It adopted the market transactions to make statistics open, that is the objective or indirect indicators, which can reflect the psychological characteristics of investors after the event indirectly. Zweig (1973) found that closed-end fund can be used as a measure of investor sentiment, after that, Huang (2015), Qiang and Shue (2009), etc. used closedend fund to measure investor sentiment, while some scholars used IPO Volume (Honghai, Xindan, & Ziyang, 2015), turnover (Baker & Stein, 2004), turnover rate (Mei, Scheinkman, & Xiong, 2009), the number of new accounts (Yanran & Liyan, 2007), the relative strength index (Kim & Ha, 2010) as a proxy of sentiment.
The compound sentiment index is a comprehensive reflection of market sentiment, and it can be used as the key to measure the indirect index by constructing a comprehensive reflection of market sentiment by means of synthesizing on the individual market sentiment proxy variables. The index created by Baker and Wurgler (2006) is more famous in this area, which is often called the B-W index. Baker and Wurgler use the closed-end fund discount, the exchange rate, the number of IPO, IPO returns on the first day, and dividends accounted for more than six variables using principal component analysis to construct the investor sentiment index, and they mainly studied the effect of investor sentiment on the cross section of stock returns. They show that stocks difficult to value and to arbitrage react more strongly to investor sentiment than the other categories of stocks with opposite characteristics.
The method proposed by Baker and Wurgler (2006) has been widely recognized in the following research, and now most of the scholars obey this method in the field of investor sentiment composite indicators, such as, Yuan (2012), Ben-Rephael, Kandel, and Wohl (2012), etc.; while the domestic scholar Zhigao and Ning (2008), Huang, Wen, and Yang (2009), Yumei and Mingzhao (2010), Yuan (2012), Qiang and Shue (2009), Ma and Zhang (2015), etc. also adopted this method. The above scholars' research all shows that the investor sentiment can be well represented by the investor sentiment index constructed by B-W method, and their research shows that investor sentiment has a significant impact on the stock market returns.
In conclusion, the conclusions are different in the study of sentiment representation by a single proxy variable. This may be because in the face of the complex market, a proxy variable representing sentiment usually reflects only one aspect, thus it creates the contingency of data to the empirical results, so that different researchers will have different conclusions. In the B-W method, the principal component analysis is used to extract the most important element from the proxy variables of the single investor sentiment. This comprehensive index overcomes the defect that a single variable reflects the investor's sentiment, which lays the foundation for the further study of investor sentiment. At present, most of the scholars follow them on the issue of construction of composite index of investor sentiment. We believe that when choosing the sentiment measurement index, B-W method of constructing index is a good choice, but when constructing sentiment index in the use of B-W method, the selected proxy variables need to be determined according to the actual situation.

Data
Combined with the existing research literature and practical experience, in order to comprehensively reflect the various aspects related to market sentiment, in this paper, we select the corresponding market sentiment indicators from the most possible market dimensions, and form a monitoring system of investor sentiment indicators.

Investor sentiment index monitoring system
The essence of the study of investor sentiment is to find the proxy indicators which can express sentiments, these proxy indexes should be observed, quantifiable, and can also objectively reflect the views of investors on the market. Only by choosing the appropriate sentiment proxy variables, we can construct the investor sentiment index which can measure the investor sentiment more accurately. In order to reflect the various aspects related to market sentiment, we selected nine corresponding market sentiment proxy indicators from six dimensions to form investor sentiment monitoring index system.

Market transaction indicator
The proxy variables of transaction behavior directly reflect the behavior of investors, which is an effective reflection of investors' market view. According to the different meanings, the proxy variables can be divided into three categories: (1) indictor of trading activity, such as turnover; (2) indicator of potential investor growth in the market, such as the number of the new A shares in the stock account-related indicators; (3) indicators that reflect investor perceptions of market stocks, such as new investor account and Margin related indicator. The indicators we selected are trading volume, new investors' stock accounts and turnover, which are abbreviated as TURNVOL, TURNOVER, NAA, respectively.

Market structure indicator
There are many indicators which can reflect the market structure; here, we selected proportion of rise and fall of Shanghai and Shenzhen stock market and SWS loss index. The higher the rise and the lower the ratio are, the more likely the market structure is to be optimistic about the market outlook, the higher the investor sentiment. Proportion of rise and fall is abbreviated as UDR, SWS loss index is abbreviated as LSI.

Market valuation indicator
In the valuation indicator, price earnings ratio and market-to-book ratio are more representative, they not only reflect the price of the stock market, but also reflect the financial continuity of the listed company's earnings under the macroeconomic operation. The index reflects the sentiment of the market to a certain extent. In general, the higher the valuation indicator is, the higher the market sentiment will be. We selected the price earnings ratio as the market valuation index, abbreviated as PE. https://doi.org/10.1080/23322039.2017.1412230

Institutional investor behavior indicator
Block trading refers that the size of a single transaction is much larger than the average market size of a single transaction. The block trading system established for block trading, is different from the normal trading system. Block trading system is generally for institutional investors to occupy the main position of the structure of the investors to make adaptive arrangements. Therefore, the discount premium of the bulk of the transaction reflects the market institutional investor sentiment, the general premium transaction shows high sentiment, discount trading shows low sentiment. This paper selects the large trading premium turnover ratio as a proportion of the institutional investors' behavior index, abbreviated as LTPTR.

New share indictor
Usually when the market is hot, investor sentiment is high, the market can withstand the greater the amount of new shares issued, the market is also active in the pursuit of new shares listed on the first day of higher earnings. Therefore, the number of IPO issued in a given period can indirectly reflect investor sentiment to some extent, abbreviated as NIPO.

Subjective sentiment indictor
Composite sentiment index of the past generally only considers the objective sentiment indictor. In this paper, we only choose the consumer confidence index as subjective indictor, mainly reflecting the confidence of individual investors, which is directly proportional to the investor sentiment. It is a better measure of investor sentiment (Fei, 2005), therefore, we select CCI as the proxy of sentiment (see Table 1).

Control variable declaration
Taking into account the representative of China's macroeconomic cycle variables and the availability of data(monthly), this paper selected industrial added value(IAV), the consumer price index(CPI), macroeconomic climate index (MBCI) and manufacturing purchasing managers index(PMI) as proxies for economic fundamentals from the three aspects of production, consumption and economic.

Discussion and results
We follow these steps to build investor sentiment index. Firstly, we select the basic proxy index of sentiment, preprocess and classify the original data according to the meaning of the index, then form the basic index set. Secondly, we put the basic index set into standardization, then form the standard index set. Thirdly, the principal component analysis was carried out on the standard index we extract principal component and form investor sentiment index. Fourthly, we eliminate the

Correlation analysis
In order to ensure that each variable can be a good principal component analysis, this paper first carried out the correlation analysis of the 18 variables by the KMO test and the Bartley sphere test. Results are shown in Table 2. Table 2 shows the overall KMO coefficient of 18 variables is 0.686, Bartley ball test results shows that Sig was 0, this two kinds of methods are related to the degree of the 18 indexes that is relatively high, therefore it is suitable for principal component analysis.

Determination of lead and lag variables
Before the principal component analysis, it is necessary to determine the relationship between the lead and lag of each source index because there may be a time lag and a premature relationship of the influence of different indicators on investor sentiment. First of all, the principal component analysis of the earliness and lagged variables of the nine indexes are used to construct an investor sentiment index with 18 variables. It needs to be explained that in the process of SMT calculation, this paper improves the BW index construction method. That is, strictly abided by the cumulative variance explained at least 85% of the statistical standards, each with 1, 2, 3, 4, 5, the weighted average of the principal component (here the cumulative variance is explained at the rate of 87.88%). Then we make, respectively, for SMT and nine indexes in lead and lag variable into the correlation analysis, and accordingly choose the related coefficient larger nine variables as a source of synthetic sentiment index (SMT).
As it can be seen from Table 3, the indicators are tested by the significance, which SMT and LSI t−1 , TURNVOL, NIPO t−1 , CCI t−1 , TURNOVER, UDR, NAA t−1 , PE t−1 , LTPTR t−1 are relatively high degree, the exchange rate is rising falling ratio, new stock accounts, earnings, the bulk trading premium turnover ratio reflects investor sentiment index in advance. We select these nine variables as the SMT final variable.

SMT construction (no macroeconomic indicators are excluded)
Firstly, we standardize the nine source indicators, variables LSI, TURNVOL, NIPO, CCI, TURNOVER t−1 , UDR t−1 , LTPTR t−1 , NAA t−1 , PE t−1 . Then, the principal component analysis of the nine variables is carried  out. The interpretation rate of cumulative variance of principal component from the first to the fifth is 91.623%. SMT equation is finally obtained and the correlation coefficient between SMT and the variables is shown in the formula (1) and Table 4.
From Table 4, we can get the following correlation between the investor sentiment index (SMT) and the nine variables. Turnover rate, the number of new shares, the ratio from rise to fall, price earnings ratio and block trading premium ratio reflect investors' sentiment in advance. That is, the greater the value of the earlier variables is, the higher the sentiment of the later investors will be.

SMT construction (excluding macroeconomic factors)
The SMT built above does not eliminate the effects of macroeconomic fluctuations and the rational elements. Therefore, it cannot fully reflect the irrational sentiment of investors. We put the above nine source variables and the industrial added value (IAV), the consumer price index (CPI), macroeconomic climate index (MBCI), and manufacturing purchase management index (PMI) into regression, respectively, and the residual sequence after each regression is obtained. Then, the above nine residual variables are analyzed by principal component analysis. SMTrt equation finally obtained and the correlation coefficient between SMTrt and the variables is shown in the formula (2) and Table 5.

Correlation test
It was found by Pearson correlation test that the phase coefficients of SM and Shanghai composite index monthly yield are 0.606 (see Table 6). This shows that investor sentiment has a greater impact on the stock market. Moreover, investor sentiment and stock market trends tend to maintain a more consistent trend, and it has the ability to predict ahead of time (See Figure 1).
Sentiment is a kind of endogenous market power and the change of sentiment and stock market should have certain regularity. We can see from Figure 1 that, investor sentiment rose sharply between February 2006 and May 2007 and during this time, the Shanghai Composite Index rose from1100 points in January 2006 to around 4,100 points in May 2007. And the famous "530", in May 30, 2007, due to the sudden increase in stamp duty, it results in the stock market crash, the highest decline to 477 points, a limit of more than 900 stocks. From Figure 1, it can be seen from May 2007 to June that the sentiment also fell sharply and after that the sentiment has increased, the Shanghai Composite Index reached 6,124 points. From August 2007, sentiment began to fall sharply, and the stock market began the crash of the journey from November 2007. From this, we can see that our index of investor sentiment has fallen ahead of the Shanghai Composite Index and it has a predictive function. The stock market is doing shock boxes from 2009 to 2014, while Investor sentiment index SMT is also in low shock. In June 2014, SMT began to rise, while the Shanghai Composite Index began a round of rose from September. So the index we constructed has better prediction function on Shanghai Composite Index. (1) (2) SMTr t =0.323TURNVOLr t + 0.132NIPOr t−1 + 0.064CCIr t−1 + 0.319TURNOVERr t + 0.240UDRr t + 0.442NAAr t−1 + 0.376PEr t−1 + 0.026 × LTPTRr t−1 + 0.327LSIr t−1

Conclusion
The main contribution of this paper is to classify the relevant proxy index of investor sentiments according to 6 market dimensions, which compose investor sentiment monitoring index system. And we select the circulation market value weighted market turnover, turnover, the new account number of A shares, stock markets of Shanghai and Shenzhen rise number and the number of the ratio fell, SWS's loss of stock index, price earnings ratio, the bulk trading premium turnover ratio, IPO volume, the consumer confidence index drawn will eliminate macroeconomic factors and construct a comprehensive index to measure investor sentiment in A stock market using the method of Baker and Wurgler (2006).
In the future, we can make further research from the following two aspects in the construction of investor sentiment index. Firstly, according to the change of market characteristics, adjusting the structure of sentiment proxy variables and optimizing the parameters of sentiment proxy indicators, we can achieve "active" and "quantitative" combination by optimizing the parameters of proxy variables such as frequency, interval span, lag period and so on, on this account we can solve the problem of instability of proxy variables, thus we can construct an effective composite index. Secondly, we can use the datamining technology to explore the construction and application of the comprehensive index of investor sentiment from the perspective of network public opinion.