S&P BSE Sensex and S&P BSE IT return forecasting using ARIMA

This study forecasts the return and volatility dynamics of S&P BSE Sensex and S&P BSE IT indices of the Bombay Stock Exchange. To achieve the objectives, the study uses descriptive statistics; tests including variance ratio, Augmented Dickey-Fuller, Phillips-Perron, and Kwiatkowski Phillips Schmidt and Shin; and Autoregressive Integrated Moving Average (ARIMA). The analysis forecasts daily stock returns for the S&P BSE Sensex and S&P BSE IT time series, using the ARIMA model. The results reveal that the mean returns of both indices are positive but near zero. This is indicative of a regressive tendency in the long-term. The forecasted values of S&P BSE Sensex and S&P BSE IT are almost equal to their actual values, with few deviations. Hence, the ARIMA model is capable of predicting medium- or long-term horizons using historical values of S&P BSE Sensex and S&P BSE IT.


Introduction
Theoretical and empirical studies have revealed that the relation between stock markets and economic growth is positive (Kim et al. 2011;Guptha and Rao 2018;Mallikarjuna and Rao 2019). Investment decision plays a significant role in attaining the desired returns through stock market forecasts. However, stock markets are characterized by their dynamic, complex, and volatile nature. Hence, forecasting stock prices and returns is a challenging task. Stock or investment returns are based on many factors-primarily, the prediction of stock movements. The prediction and estimation of stock returns in a particular stock exchange/s occurs hourly. Considering the importance of forecasting stock prices and their returns, researchers have paid significant attention to enhancing the model accuracy in the prediction of stock price movements and returns. In this regard, the fundamental explanation is that investors, policymakers, and financial institutions must be dynamic and excel in their decision making in order to optimize the returns on their investments. When stock markets are efficient, capital assets would be appropriated in the pre-eminent conceivable way (Fama 1970). The efficient market hypothesis (EMH) (Fama 1965) asserts that a market is efficient when the prices fully reflect public and private information. Market efficiency has 3 forms: weak, semi-strong, and strong. The weak form specifies that forecasted values cannot be influenced by historical prices. The semi-strong form is subjected to openly accessible data. The strong form states that the stock price movements have an impact on all open and inside information. All three forms are tested in this study. If a prediction model can provide a good estimation of the movement of stock prices, then the uncertainty and risk involved in the investment process could be minimized. It would thus be useful for investors and policymakers to stipulate appropriate investment decisions and required measures to improve the flow of investments in stock markets. Several techniques have been used to forecast the stock market. The main purpose of forecasting is to assist in investment decisions, improve investors' accuracy, and enhance efficient performance. However, the general uncertain conditions in the stock market may change or disrupt the stock market consistency. Uncertainty conditions could be overcome by applying appropriate stock market strategies through accurate forecasting tools (Zhang et al. 2019a(Zhang et al. , 2019b. Accurate and fast forecasting of the stock market is the main challenging aspect. Many researchers have focused on finding the best forecasting tools and methods to obtain fast and accurate predictions of stock prices (Javier and Rosario 2003). In time series analysis, autoregressive integrated moving average (ARIMA) is one of the best statistical forecasting methods for investors to get fast and accurate information on stock predictions. Moreover, the ARIMA models have shown evidence of whether the series is following integrated steps for stationarity or differencing steps for non stationarity (Merh et al. 2010).
The Bombay Stock Exchange (BSE) is considered one of the premier stock markets in the world. The S&P BSE Sensex is the bellwether index in the BSE. It measures the performance of 30 companies listed on BSE Ltd., which are popularly known as blue-chip companies. Among all sectors in the BSE, the leading sector is S&P BSE Information Technology (IT), with capitalization of 12.19%; in comparison, that of the S&P BSE Sensex is 100%. 1 The second most capitalized sector is the S&P BSE IT. It is intended to provide the investors with a benchmark reflecting companies included in the S&P BSE All Cap that are classified as members of the IT sector.
The primary objective of this study is to fit the ARIMA model in a way that best estimates the movements of the stock market. Further, it looks into how volatility acts on different time horizons of investment. Furthermore, it examines whether forecasted values are aligned with the actual values.
There are many techniques to forecast the movement of the stock market. The main motive of any stock market forecasting technique is to predict the movement of stock market prices more accurately. However, the existence of information asymmetry, insider trading, and other anomalies may change the direction of the market or lead to inconsistency in market performance. In addition to this, personal biases of investors such as overconfidence and illusion of control, the narrative fallacy, anchoring bias, loss aversion, herding mentality, etc., caused the wrong prediction of movements in the prices of stock markets. These are some causes of sudden loss in invested funds due to wrong estimations being made by investors on their investments or portfolios (Neely et al. 2014;Wang et al. 2018;Challa et al. 2018). Hence, the underlying problem is the estimation of more accurate and fast predictions of stock prices. There are few studies in the area of forecasting stock prices using GARCH and ARIMA models across developed stock markets and very few in developing stock markets. Further, most studies restricted themselves to estimating the movement of stock prices and ignored a comparison of the estimated values and the actual values to verify the accuracy of estimation (Zhang et al. 2019a(Zhang et al. , 2019b. Furthermore, no single study has made a comparison between S&P BSE Sensex and S&P BSE IT. Hence, it is necessary to carry out a detailed investigation to bridge this gap.
The S&P BSE Sensex is the oldest and most popular index of the BSE. It provides the most accurate measurement of the financial position of the stock market. Indeed, it is considered a barometer of the Indian stock market. The IT sector has seen tremendous growth after the liberalization of the Indian economy, and IT and IT-enabled services occupy a lion's share in the service sector. Hence, small changes in these indices may have a great impact on the overall performance of the Indian stock market. The direction as well as the relationship of causation holds good for the IT segment of the BSE.
For this analysis, the authors used statistical and econometric models such as descriptive statistics, variance ratio (VR), Augmented Dickey-Fuller (ADF), Phillips and Perron (1988), Kwiatkowski Phillips Schmidt and Shin (KPSS), and ARIMA. First, the authors conducted an analysis of the performance of the S&P BSE Sensex and IT indices, a review of the literature, and an empirical study on market efficiency. An empirical analysis using the aforementioned models followed to calculate future returns. Moreover, ARIMA models were used to forecast the data series of S&P BSE Sensex and S&P BSE IT; these models can determine whether the actual stock prices are aligned with the estimated values.
The results can be summarized as follows. The descriptive statistics show that the mean and variance of the S&P BSE Sensex and S&P BSE IT returns show linearity. In addition, the VR test revealed that the S&P BSE Sensex and S&P BSE IT returns could be strongly predicted based on historical prices. The ARIMA model was used to determine the values of the parameters using autocorrelation (AC) and partial autocorrelation (PAC) coefficients; ADF test, PP test, and KPSS were used to test the stationarity of the data. The results showed that the time series data have stationarity. This study estimates the ARIMA model through identified values and auto-ARIMA. The results revealed that the mean returns of both indices are positive, but near zero. This may be an indication of a regressive tendency in the long-term. The forecasted values of S&P BSE Sensex and S&P BSE IT are almost equal to the actual values, with few deviations. To verify the accuracy of the estimations, the prediction was done for two years, and then the predicted values were compared with the actual values.
As for the EMH, the share prices reflect all the information, and it is impossible to generate a consistent alpha. Hence, it can be inferred that the stocks may not outperform the overall market due to either expert stock selection or market timing. The results indicated a regressive tendency in which the returns are estimated with high accuracy in the long run. This is evident in the case of S&P BSE Sensex and S&P BSE IT, where the estimated and actual values are almost equal. This reveals that both the indices were not following random walk theory. In other words, the movements of the indices are predictable. Hence, both the BSE indices under study exhibited a semi-strong form of EMH, as their stock prices are forecasted based on past data. There is no relevance for the strong form of EMH in this study as the researchers used only public information and ignored private information.

Literature review
Stock returns forecasting mechanisms are important to the development of investment policies. However, based on EMH, consistent risk-adjusted returns (Kou et al. 2014) above the line of market profitability as a whole are not possible. Computational advancements have led to various econometric models, which have been used consistently to anticipate stock market movements and thus forecast future stock prices and stock returns (Suits 1962;Zotteri et al. 2005;Wen et al. 2019). ARIMA models are efficient to forecast short-term financial time series data (Schmitz and Watts 1970;Rangan and Titida 2006;Kyungjoo et al. 2007;Merh et al. 2010;Sterba and Hilovska 2010). Various studies have used ARIMA forecasting models to predict stock returns (Khasei et al. 2009;Lee and Ho 2011;Khashei et al. 2012). Gerra (1959) examined the stock price movements for the egg industry by using least squares methods. The Jenkins ARIMA approach is more efficient and accurate than other economic models such as regression and exponential smoothing (Reid 1971;Naylor II et al. 1972;Newbold and Granger 1974). The ARIMA approach is more accurate with forecasting short-term stock returns than long-term returns (Sabur and Zahidul Hague 1992). Neely et al. (2014) used technical indicators to forecast stock returns and found that technical indicators are economically and statistically significant. Several studies have relied on the predictability of stock returns (Rapach et al. 2010;Zhu and Zhu 2013;Pettenuzzo et al. 2014;Jiahan and Ilias 2017). Rapach et al. (2010) forecasted the equity premium (Welch and Goyal 2008;Turner 2015) by using compound returns on S&P 500 index including dividends and rate on treasury bills and established a link between the forecasted values and real economy. Phan et al. (2015) discussed evidence-based forecasting for stock returns. Rapach et al. (2016) showed the vector autoregression decomposition from a cash flow channel, which in turn showed the source of predictive power. Furthermore, there is evidence of a relationship between short-sellers and traders. Wang et al. (2018) showed the dynamic relationship between returns and volume based on US stock returns. They found that investors do not gain much profit by following the volume curve. Zhang et al. (2018) examined oil price forecasting by using 18 macroeconomic and 18 technical indicators. The results showed accurate forecasts and generated certainty equivalent return gains for a mean-variance investor. Zhang et al. (2019aZhang et al. ( , 2019b) explained not only the trading behavior of intraday stock movement, but also the evidence of U-shaped investment curve. They found that afternoon stock prediction is significant using morning returns.
This study analyzed the efficiency of BSE. In the past decades, many researchers discussed the efficiency of stock market predictability (Fama 1970(Fama , 1991Lo and MacKinlay 1988;Fama and French 1988). Stock markets are considered efficient if stock prices fully reflect, at any point in time, relevant or available information. EMH (Fama 1965) is one of the most widely accepted financial theories. Various approaches have been used to test the EMH for stock markets, for instance, serial correlation tests, unit root tests, and VR tests (Wu 1986(Wu , 1996Laurence et al. 1997;Mookerjee and Yu 1999;Liu et al. 1997;Groenewold et al. 2003;Seddighi and Nian 2004). Lo and MacKinlay (1989) proved that VR tests are more powerful than unit root and serial correlation tests (Munteanu and Pece 2015), particularly in the existence of heteroscedasticity.
Individual VR tests in the literature have not provided consensus on the weak EMH, so multiple VR tests are preferable (Long et al. 1999;Darrat and Zhong 2000;Ma and Barnes 2001;Lee and Rui 2001;Lima and Tabak 2004;Fifield and Jetty 2008). Chow and Denning (1993) suggested that multiple VR tests are useful to avoid misleading statistical inferences based on asymptotic normal probabilities. Whang and Kim (2003) and Kim (2006) proposed powerful alternatives: sub sampling of non-dependency asymptotic probability and wild bootstrap probability.
Following this logic, this study adopted multiple VR tests, as suggested by Whang and Kim (2003) and Kim (2006), and the conventional Chow-Denning test to study the random walk hypothesis for the BSE (Diebold and Inoue 2001;Kapetanious and Shin 2011;Aye et al. 2017).

Problem statement
As mentioned earlier, several studies have been carried out on the prediction of stock market returns using ARIMA and other models, especially in developed markets. However, very few have focused on developing and less developed markets. Among the existing models, ARIMA has proved to be more efficient and accurate (Box & Jenkins 1970). Furthermore, the ARIMA model is more suitable for more accurate estimates of short-term returns than long-term returns, though many previous studies have used the ARIMA model to estimate long-term returns. However, there are very few studies on the prediction of returns on the Indian stock market in general, and S&P BSE Sensex in particular. It is evident from the literature that no study has predicted the returns of the S&P BSE Sensex and its subcomponent, that is, the S&P BSE IT, which is a sectoral index. This study feels this gap in the literature. Based on the observations of the literature and its objectives, this study hypothesizes that there is no significant relationship between actual and predicted values of S&P BSE Sensex and S&P BSE IT stocks.

Data and methodology
Data were collected from two indices, S&P BSE Sensex and S&P BSE IT. Empirical analysis was carried out on the daily returns of the S&P BSE Sensex and S&P BSE IT indices, for the period January 1, 2007 to December 31, 2017. It was observed that all indices have experienced high volatility in performance. However, the data also experienced the highest shock during the year 2008-2009 for all 13 indices. The reason was the worldwide financial crisis, which also affected the Indian stock market (Eigner and Umlauft 2015).
In this context, there is a need to determine whether the above-mentioned crisis caused steep to and fro changes in stock prices listed on the S&P BSE Sensex and S&P BSE IT. Furthermore, it is also necessary to apply the ARIMA model with validation and testing, which was not done in most previous studies. Therefore, an attempt is made to test and forecast the stock prices by incorporating ARIMA models. The data were collected from www.bseindia.com, and the daily returns calculated using the following formula.
R it is the return of the index; P t is the closing price of the index at time t; P t − 1 is the closing price of the index at time t-1; and. ln is the natural logarithm of returns. The ARIMA model is used to forecast future returns, and it is a combination of autoregressive and moving average models (Pankratz 2009). The mathematical formula of the model is as follows.
The Box-Jenkins method is one that assumes the time series has underlying stationarity, if not applied by the first-degree difference. This is called the ARIMA (p, d, q) model, where d represents the selection of the differencing degree. If the time series already possesses stationarity, then ARIMA (p, d, q) will be termed an ARMA (p,q) model.
Many researchers believe that GARCH and EGARCH models cannot provide the best results compared with ARIMA models, and that ARIMA is the best model for forecasting and modeling stock prices (Miswan et al. 2014;Pahlavani and Roshan 2015). Hence, the ARIMA model is appropriate to predict stock returns accurately with prospective market strategies to be followed by investors. Furthermore, some mixed models like ARIMA-GARCH, TGARCH, EGARCH, or GJR may be used to find the volatility of stock prices or returns by assuming symmetric or asymmetric effects. However, according to Thushara (2018), ARIMA and ARIMA-GARCH models produce the same results over time, and volatility does not change. Hence, the ARIMA model, along with the mean and variance equations, is used to predict future returns.
In a real-time situation, the appropriate model could be determined based on four steps. The first step is identification, in which the correlogram and partial correlogram tools are employed to determine the appropriate values of p, d, and q. Moreover, the ADF test is used to test the stationarity of the data. The second step is estimation, in which the parameters are estimated after identification of the chosen model, using the least squares method. The third step is a diagnostic check to examine whether the residuals from the fitted model have white noise. If it exists, accept the chosen model; otherwise, start afresh. Therefore, this model is an iterative process. In the fourth step, forecasting performance, the successful ARIMA model from step three is used within and outside the sample period to forecast future returns of stock prices.

Descriptive statistics
An overview of the basic statistical features of time series data is necessary before data analysis. Figure 1 shows the daily returns of the S&P BSE Sensex and S&P BSE IT. The authors used the statistical software Eviews 9.5 to analyze the data and applied each step of the ARIMA process. Figure 1 depicts the returns on the 'y' axis and years on the 'x' axis; years 2007 to 2017 are termed 1 to 18.
The descriptive statistics of S&P BSE Sensex and S&P BSE IT are summarized in Table 1. The Table 1 reveals that the mean returns are positive but nearly zero, which indicates a regressive tendency in the long-term. The differences between the minimum and maximum values are 0.1198(S&P BSE Sensex returns) and 0.0979 (S&P BSE IT returns). The standard deviation is 0.6% for S&P BSE Sensex  Therefore, the null hypothesis of normal distribution was rejected at the 5% level for both the indices.

Variance ratio test
A popular approach to predict asset prices is the MacKinlay (1988, 1989) VR test, which is useful to examine time series data's predictability by comparing the variances of returns at various intervals. Moreover, if it is assumed that the data follow a random walk, then period variance must be in the times variance of a single period difference (Tabak 2003). Hence, the VR test is based on the assumption that the data follows random walk or not. The present analysis follows the rank, rank-score, and signbased forms of Lo & MacKinlay and Kim to determine statistical significance. MacKinlay's (1988, 1989) VR test could be performed in homoscedastic and heteroscedastic random walks, which use asymptotic normal or wild bootstrap (Kim 2006) probabilities. In addition to the rank, rank-score, and sign-based forms (Wright 2000), tests have been evaluated with bootstrap for statistical significance. Furthermore, Wald and multiple comparison VR tests (Richardson & Smith 1991;Chow and Denning 1993) have been performed for several intervals. In this analysis, the random walk series was assumed to test the data. S indicates the series from 1 to 7; S1 indicates the VR test for Lo and MacKinlay (1988) homoskedasticity, no bias correction, and random walk series; S2 is the VR test for Lo and MacKinlay (1988) Heteroskedasticity, martingale series; S3 defines VR test for the Wright (2000) rank and random walk series; S4 shows the VR test for the rank score and random walk series; S5 represents the VR test for the sign-based test and martingale series; S6 implies the VR test for Kim (2006), homoskedasticity and random walk series using 1000 replications; S7 infers the VR test for Kim (2006), Heteroskedasticity and random walk series using 1000 replications.
Curly  (Deo & Richardson 2003). Table 2 shows the calculations of standard MacKinlay 1988, 1989), nonparametric (Wright 2000), and multiple VR tests (Chow and Denning 1993), and the  Table 2 shows that the returns of S&P BSE Sensex and S&P BSE IT could be strongly predicted based on historical prices. Hence, it may be concluded that these indices are not efficient. This finding is consistent with Rapach et al. (2013), who used the same methods and confirmed that the weak form was rejected.

Application of the ARIMA methodology
The ARIMA could be processed in two stages: the first is developing the ARIMA model, and the second is validating the predicted results with actual ones for the holdback period of two years (January 1st 2015 to December 31st 2017). From the observed literature it is evident that two years holdback period is appropriate in order to validate the accurate predictions. The authors also tested whether residuals are white noises through the diagnosis and parameter significance tests.

Developing the ARIMA model
Correlogram to determine the appropriate values of p, d, and q AC and PAC are two types of correlation coefficients for correlograms. The autocorrelation function (ACF) represents the correlation of current first-differencing S&P BSE Sensex and S&P BSE IT returns with 12 lags. The partial autocorrelation function (PACF) indicates the correlation between the total observations of the study and their intermediate lags. ACF and PACF are applied using the Box Jenkins methodology to identify the type of ARMA model and determine the appropriate values of p and q. The ACF is calculated by the following formula: ρ k is the ACF of the given sample; γ k is the covariance at lag k; and. γ 0 is the sample variance. Figure 2 shows the 12 series of S&P BSE Sensex and S&P BSE IT returns of the AC, PAC, Q-stat, and probability statistics. The standard error calculation is used to test the significance of each AC coefficient. The dotted lines represent the error bounds on each side of the AC and PAC, which could be measured using the following formula. ρ $ AE2= ffiffiffiffi T p ðivÞ Figure 2 shows that few correlations are statistically significant using the standard error correlation coefficient formula; this can be calculated using ffiffiffiffiffiffiffi ffi 1=n p = ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1=2724 p = 0.01916, where n is the sample size. Therefore, the 95% confidence interval, according to the normal distribution forρ k , is 0 ± 1.98084 (0.01916) or (− 0.037953 to 0.037953). If correlation coefficients are outside these bounds, they are statistically significant at the5%level. Hence, both ACF and PACF correlations at lags 1, 2, 6, and 8 seem to be statistically significant for S&P BSE Sensex. Therefore, p and q values for the ARMA model are 1,2,6, and 8 for S&P BSE Sensex, which can be denoted as AR (1), AR (2), AR (6), and AR (8) for autoregression lags, and the moving average lags are MA (1), MA (2), MA (6), and MA (8). For S&P BSE IT, the correlations lags are 1, 2, and 5, and can be designated as AR (1), AR (2), AR (5), MA (1), MA (2),and MA (5).

Unit root tests
The unit root tests are used to examine stationarity in the series. In the present analysis, three tests are conducted to check the presence of unit roots: ADF, PP, and KPSS. The null hypothesis of the stock returns series, which holds a unit root for ADF, PP, and KPSS, was rejected as it was less than 5% of p-values. Therefore, all three tests confirmed that the stationary series did not comprise unit roots.

TE1 -Test equation with intercept; TE2 -Test equation with trend & intercept;
TE3 -Test equation without intercept; Table 3 shows strong evidence of stationarity for S&P BSE Sensex and S&P BSE IT returns with the absence of long-term shocks in their returns. The unit root tests for the above three methods show the same results in cases without intercept, with intercept, and with trend and intercept values for S&P BSE Sensex and S&P BSE IT.

ARIMA model estimation through identified p, d, q values
ARIMA is a combination of AR and MA terms. To estimate the best-fit values, the linear regression model was executed. The estimation of the S&P BSE Sensex best- fit ARMA model is based on the lags of 1, 2, 6, and 8; the AR and MA were executed, and the results are shown in Table 7 in Appendix. Table 7 in Appendix shows the estimation criteria of both the S&P BSE Sensex and S&P BSE IT sectors. In the S&P BSE IT sector, the AR and MA terms 1, 2, and 5 are significant, but S&P BSE Sensex MA (8) is not significant. Therefore, the term MA (8) was removed after adjustment in consideration of the S&P BSE Sensex AR and MA terms 1, 2, and 6. According to these terms, the estimation of ARMA is depicted in Table 8 in Appendix.Since the MA (8) coefficient was not significant, MA (8) was dropped, and the model is re-estimated with the AR (1), AR (2), AR (6), MA (1), MA (2), and MA (6) terms. The results are shown in Fig. 4, which reveals the randomly distributed residuals from the least squares regression method. Akaike Information Criterion (AIC) and Schwarz Criterion (SC) are the most preferable measurements to choose the best model. The AIC value for S&P BSE Sensex is − 7.019098 for the AR term and − 7.304545 for the MA term. The S&P BSE Sensex accumulated SC in the AR term is − 7.008247 and − 7.293694 for the MA term. In the case of S&P BSE IT, the AR and MA terms for AIC are − 6.720785 and − 7.038715, respectively. The SC values for AR and MA are − 6.709934 and −   Table 4. In general, the maximum likelihood estimation made through the outer product of the gradients/ Berndt-Hall-Hall-Hausman method for least squares follows the AR term. For ARIMA models, it is complex to mention likelihood as an explicit function, but it is beneficial for the innovations or prediction errors. The combination of (1, 6) for S&P BSE Sensex obtained the best-fit ARMA model, as shown in Fig. 3. Figure 3 also shows the best-fit ARMA model for the IT sector, which reveals the terms are 1 and 2.
The residuals from both the best-fit models were tested for ADF, which revealed that the data of residuals from this method are stationary.

ARIMA model estimation through auto ARIMA
The Auto ARIMA model estimation was carried out using AIC comparisons, which determine the best fit of the time series data for future forecasting. In this model, 25 observations of ARMA terms were estimated. The estimated ARMA terms and respective AIC values are presented in Table 5.

Forecasting ARIMA
Once the ARMA is fitted, it could be used for forecasting future returns. This is possible through two types of forecasting methods: static and dynamic. The actual present and lagged values were used in static forecasting, whereas the previous forecasted values were used in dynamic forecasting. Using the model in Fig. 3, the static and  dynamic forecasting values are shown in Table 6. Root mean square error (RMSE) and mean absolute error (MAE) were the measures used to isolate the forecasting model more appropriately. Table 6 provides the RMSE and MAE values of S&P BSE Sensex and S&P BSE IT returns. MAE and RMSE were calculated according to the errors between the forecasted and the actual data. The selected ARMA models provide more accurate results for the holdback period.

Validation for actual and forecast values
The validation phase is important to determine the accuracy of the predicted values. This could be achieved by using a static forecasting instrument in the ARIMA process.
In other words, after the completion of the estimation phase, the authors attempted to forecast the future returns by comparing these forecasted returns with the actual ones. In this study, the holdback period was from January 1, 2015 to December 31, 2017. The actual and forecasted values are depicted in Fig. 4. In Fig. 4 (a), SENSEX_RETF refers to the forecasted values, which are specified with a blue line. DSEN is referred to as the first-degree values of S&P BSE Sensex returns, which are marked by a red dashed line. Both values are traversing simultaneously, which means that the forecasted values and the actual values are almost the same. However, very few variations were identified in May 2015, August 2015, and February 2016. These variations may indicate error-prone areas of prediction, RMSE (0.005), and MSE (0.004), which are shown in Table 6. Figure 4(b) provides the comparative graph of the S&P BSE IT sector, which represents IT_RETURNSF (forecasted IT returns) with a blue line and DIT (first degree of IT returns) with a red dashed line. The forecasted and actual values are almost the same, but few variations were observed in July 2015, August 2015, July 2016, June 2017, and August 2017, which indicated the error predictions, evidencing to RMSE (0.006), and MSE (0.005) in Table 6.

Findings of the study
The descriptive statistics of S&P BSE Sensex and S&P BSE IT revealed that the mean returns were positive but nearly zero. It indicates regressive tendency in the long-term values. An asymmetric tail indicates a high probability of earnings in returns with high risk, as the value of skewness is greater than the mean value of returns. The S&P BSE Sensex Jarque-Bera value is much higher than the standard normal distribution. Therefore, the null hypothesis of normal distribution was rejected at the 5% level for both the S&P BSE Sensex and S&P BSE IT. The statistics of the standard VR test, non-parametric VR test, multiple VR test, and modified version of multiple VR test rejected the null hypothesis of a random walk or martingale for both the index returns. Therefore, the returns of the S&P BSE Sensex and S&P BSE IT could be strongly predicted based on historical prices. Thus, it may be concluded that the results did not provide any evidence in favor of the EMH for either S&P BSE Sensex or S&P BSE IT in the long run. The findings suggest that past information priced the stocks instantly, as these indices indicate a semi-strong form of EMH.

Conclusion
ARIMA methodology is one of the most widely used forecasting methods for the stock market, which is also referred to as the Box-Jenkins (BJ) method. It can be useful for analyzing historical data of time series and moving average of random error terms. In this analysis, ARIMA (1, 6) for Sensex and ARIMA (1, 2) for IT yielded a highly accurate forecast over the two-year holdback period. In this analysis, uncertainty was found when the period is long, whereas less uncertainty exists when the period is short. The study reveals the efficiency of the process in predicting the complex and volatile series of stock data. By applying ARIMA, fast and accurate prediction was confirmed using time series data.
The results showed that the mean returns of both the indices are positive but near zero. This indicates a regressive tendency in the long-term. The forecasted values of S&P BSE Sensex and S&P BSE IT are almost equal to the actual values with fewer deviations. These findings have significant implications. Investors can choose their investments according to the forecasted returns analyzed in the present study. Furthermore, investors can invest in profitable stocks to ensure a good portfolio. This study could help researchers, companies, investors, and policymakers to make appropriate decisions in the stock market. Further, researchers can investigate the time series prediction by applying various models, such as genetic models, nanotechnology models, and non-linear regression models. Companies may frame the appropriate strategies to fetch lucrative returns on their investments. Optimum portfolio for the individual investors may be built; policymakers can take relevant decisions for smooth functioning of stock market.
Nonetheless, this study suffers from some limitations. It was confined to S&P BSE Sensex and S&P BSE IT, which comprises only a few companies of the Indian corporate sector. There are many sectorial indices under the BSE, using which could have provided a more holistic study and provided clues to investors to derive better returns on investments. Furthermore, the study could have focused on intra comparison of the accuracy of the estimation of returns on various time horizons.
Future research can consider the prediction and comparison of stock prices in developed and emerging stock markets. Moreover, long-term forecasting by applying novel technologies will provide assurance of good returns. Comparative analysis of various sectorial indices between India and other countries will be the thrust area to explore more insights in their portfolio construction, risk and return, performance, and efficiency of trading.