Scientometrics Analysis in Google Trends

The extraction of the periodical non-stationarity feature of time series is obtained via the google trend data using keywords from modern sciences. This study aims to investigate when a keyword time series gives non-stationarity pics because this satisfies that the analysis of non-stationary categorical time series yields goodness of fit practice in the prediction issue. This method is implemented via an algorithm which is based on the extraction of the nonstationary distance as well as the formulation of the polynomial regression. The non-stationary algorithm is applied and the statistical evaluation is obtained using the non-parametric Cochran’s Q Test. The Q test leads to the conclusion that the Medicine and Biochemistry sciences are ranking in the top of the user’s preference followed by Physics, Mathematics and Social Sciences, while the emerging sciences such as Material Science are in the last rank positions.


INTRODUCTION
The Internet is the ubiquitous source of information, 63% of users use google for any search.In addition to the question of reliability, there are online search engines that "get excellent" in more specific areas and will provide solutions, depending on requirements and what you are looking for.Every day, Google's search engine returns more than 5 billion search results worldwide.So, by collecting more data, a researcher has more space to improve his findings, which leads more researchers, who in turn create even more data and the cycle never ends. [1]In this case, search results are based on value and utility, rather than on who can pay more.Organizations engaged in research and dissemination such as those of university rankings use data of the previous five years.
Google Trends is a public web facility of Google Inc., based on Google Search that allows users to see how popular specific keywords or subjects are over a period.It shows how often a search term is entered relative to the total search volume across various regions of the world and in multiple languages. [2]The horizontal axis of the main graph represents time (Starting from 2004) and the vertical is how often a term is searched for relative to the total number of searches, globally.Google Trends shows the relative popularity of a search query. [3]In other words, relative popularity is the ratio of a query's search volume to the sum of the search volumes of all possible questions.Furthermore, an important issue of the above is to fit the Google Trends entry with the terminology of the Citizen Science (CS). [4]Nowadays, the frequency of keywords which the citizens use the web consists of a significant factor that shows the impact of the Sciences.Furthermore, this mechanism would allow advertisers and search engines to predict the effectiveness and quality of advertisements before they are shown. [5] the other hand, in some embodiments, the time series corresponds to the aggregated metrics of an entire source free of any precondition, the condition for this time series in a data structure.In this case, the time series is also referred to as a "conditioned" time series.In some embodiments, a source has some conditioned time series including the metrics like visits, page views, bounce rate, pages/visit, new visits and average time on site, etc. [6] An important metric of above is the metric of the (non) stationary time series process. [7]Affirms that, Non-stationarity in the mean, which is a non-constant level and can be modeled in different ways.Taking this into account the google trends data is considered as "ordinary" time series case.Furthermore, in the case of the large conditioned time series stated that the problem considered in this paper is related to such datasets.This problem is referred in related studies. [8,9]And just as with "ordinary" time series the problem of forecasting or prediction in categorical series is of importance, except that usually, it concerns the estimation of a future transition probability given past data and auxiliary information.Furthermore, the partial likelihood analysis of a general regression model for the analysis of non-stationary categorical Journal of Scientometric Research, Vol 8, Issue 1, Jan-Apr 2019 time series [10] showed that the prediction and the classification of the non-stationary categorical time series give goodness of fit in the prediction issue such as the martingale theory. [11]hus, taking this into account the above theories a non-stationary metric of the time series consists of a significant issue regarding the prediction procedure.For this reason, a method that measures the non-stationary distance by obtaining each numerical series from its time-reversed series is adopted. [12]his distance is based on a novel stationary ergodic process, in which the stationary series has reversible symmetric features and is calculated using the Dynamic Time-warping (DTW) algorithm in a self-correlation procedure.
The strategy adopted here is to divide the Google trends data into a series of segments (64 weeks) and then to analyze one part of the division at a time for (8) eighth equal segments which correspond in weekly periods respectively.The Google trends data contains 12 datasets which correspond to 12 internet's popular keywords, [13] where each of these has data of 261 weeks.Thus, the aim and scope of this study are to investigate when a keyword time series gives non-stationarity pics.For this reason, the non-stationary algorithm is applied [12] and the statistical evaluation is obtained using the non-parametric Cochran's Q Test.

Method
The proposed procedure is divided into two sections.In the first section the basis of the algorithm is described giving the methodology extraction of the non-stationary distance as well as the formulation of the polynomial regression.Briefly the aboveprocedureisdepicted in Figure 1.

Related Work
The calculation of stationarity's degree is based on previous works [12,13] in which it has been determined that a discrete time stationary process for every natural number n [9] if the equation ( 1) is achieved.

(
) ( ) Then if it is assumed that a discrete time series  corresponds a mirror time series which defined in equation (2).
; 0,1,..., 16][17] n n For 0 error = the time series n M  yields a stationary process using the error calculation of dissimilarity measure between the discrete time series Accordingly, when the path is the lowest cost one between two series, the corresponding technique (DTW) [16] gives the warping curve ( ), 1, 2,..., : The warping functions reorganize the time indices of M N ∧ respectively.Let's ϕ , the average accumulated distortion between the warped time series M N ∧ is calculated. [16]as follows: Where ( )

Implementation Stage
The algorithmic steps are divided into two parts: The first part explains the basis of the algorithm into 5 steps and the second part which is the statistical procedure of the results consists of steps 6 and 7.

Basis of the algorithm
Step 1: Let's consider the matrix M which contains the weekly data obtained by google trend with size ( ) Let's "j" the number of the repetition of the algorithm in the equal window length each time with unit step of sliding.Also, the indicator "i" is the used size of the window which is stable for every experiment and "x" is the starting point of the series. [13]Then the respective mirror data set is: Accordingly, the stationarity value is obtained via the equation (7) and is presented in the below square matrix Step 2: Then the matrix , , , is constructed as well as using the same procedure a second matrix , , , Where 0 y R n < < − is produced in order to construct a correlated pair of matrices.

Statistical Procedure
Step 6: Consequently, the evaluation of the non-parametric Cochran's Q Test is attempted in order to obtain a significant chance rate of the adjacent elements.
Step 7: Consequently, the results of the Cochran's Q Test are ranked in order to be predicted the non-stationarity feature of each keyword.

Experimental Part
In this stage, according to [15] Twelve different keywords are selected.This selection [15] tool placed according to combine the research literature of the Big Data and the Scopus database.
As a consequence, our approach to characterize Big Data literature makes use of the Scopus database combing different selected key-words useful to detect the trends and peculiarities of the Big Data research.These keywords are: Computer Science, Engineering, Mathematics, Social Sciences, Medicine, Decision Sciences, Business, Biochemistry, Materials Science, Physics, Earth Science and Energy.The experimental part considered of 12 datasets sourced from the google trends of the above keywords.In this experimental part, filters with google trends were selected: for 5 years from 6-9-213 to 6-9-2017 total 264 weeks, for all areas and all the thematic categories of the machine search which collected from all geographic areas.The graphical representation of the above 12-time series corresponding to these is shown in Figure 2.

Results-Statistical Evaluation
Step 6: The evaluation of the non-parametric Cochran's Q Test is implemented for each keyword.In more details, according to steps (1-3) the implementation of the proposed algorithm is achieved by an eighth order polynomial model.In theoretic basis, the above procedure gives maximum Eight local pics.Then a table for each keyword is constructed which consists of 8 columns and 20 rows which correspond to week periods and Treatments (see step 1) respectively.Thus, in total 12 Tables (2-13) are created.Consequently, each table is submitted in the non-parametric Cochran's Q Test on dichotomous data for 20-related trials stands on the hypothesis that the K=8 columns (Weekly periods see step 5) and N=20 rows of N-by-K matrix have the same number of successes and failures.H==0 indicates that the null hypothesis cannot be rejected at the 5-significance level.H==1 indicates that the null hypothesis can be rejected at the 5% level.X should contain dichotomous values 0/1, where one value indicates a "pass" and the other value denotes a "fail".In this experiment, the dichotomous values 0 and 1 indicate the (non) presence of the local pic respectively.The K=8 columns correspond to K related observations; the N=20 rows correspond to N distinct cases.Note that the coding of "pass" and "fail" does not matter.The highest value will be treated as a "pass".Also, it must be noticed that cases that comprise only passes or only failures do not have an effect on the test statistic.X can be cell array of (two different) strings (e.g., '0' and '1'). [18,19]The Q test returns a structure with the following fields, which are referred in Tables A (1-12): 'Q' --the value of the test statistic 'df' --the degrees of freedom of the test 'Fail' --the value regarded as a fail 'Pass' --the value regarded as a success 'Npass'--the sum of successes for each column  'Neff' --the number of effective cases (i.e., the number of cases that does show differences on the K observations).
Step 7: Consequently, the results of the Cochran's Q Test are ranked in order to be predicted the non-stationarity feature of  each keyword (see Table 1).

CONCLUSION AND LIMITATIONS
In this paper, the potential to extract periodical non-stationarity feature which is exhibited in sstwelve google trend data (Time series) are investigated.In more details, the keyword "science" which is the most common label of CS research is investigated. [4]The collected terminology took placed by previous relevant research. [15]The metric of the non-stationarity distance is adopted because it consists of a significant factor in the Prediction and Classification of Non-stationary categorical time series. [10]In more details, keyword's time series is analyzed in levels or differences.The calculation of the stationary distance is obtained by an appropriate transformation of a new algorithm. [12,13]which based in a self-correlation procedure using the Dynamic Time Warping (DTW) algorithm.Then a new Stationary distance time series is generated using the above algorithm.The observation of this generation showed that each keyword presents a variant periodicity.Furthermore, the positions of the local (non) stationary pics in eight-          As a plan in the future will be designed, an extension of this work for entering more of conditioned time series including the metrics like visits, page views, bounce rate, pages/visit, new visits and average time on site. [20]This involves studying more sophisticated time series processing and template matching techniques.
week periods are investigated using an 8 th order polyonym model.The statistical evaluation of the locally collected pics attempted to obtain a significant chance rate of the adjacent elements using the non-parametric Cochran's Q test.According to the ranking of the selected keywords of Τable 1 which are extracting via Cochran's Q test, it is ascertained that this is in agreement with recent research. [4]This agreement leads to the conclusion that the Medicine and Biochemistry sciences


and the reversible n M  .This calculation is achieved via the Euclidean and Dynamic Time warping (DTW).Then the local dissimilarity of function { } f is determined for any pair of elements n n M N ∧with the shortcut: k ϕ is a per-step weighting coefficient and M ϕ is a normalization constant, which verifies the accumulated distortions which are comparable according to different paths.To confirm realistic warps, limitations are usually imposed on ϕ .The reason behind DTW is to discover the best align- ment ϕ in order that picks the deformation of the time axes of M N ∧ which gets the couple of the time series in an alignment way.

Figure 1 :
Figure 1: The Graphical Depiction of the Proposed Method.

11 )
are calculated in order to extract the local maxima points of the graphs of the matrices [ ] MA and [ ] MB .Step 5: Consequently, the differences between adjacent elements of [ ] F and [ ] G matrices are calculated i.e.

Step 4 :
According to equations 11 the local maxima points of the graphs of the matrix A MD  is calculated in the new matrix , where ( ) d w is the variance of the time period.The graphical representation of the above twelve (12) graphs with the local maxima points corresponding to these are shown in Figures 3-14.Journal of Scientometric Research, Vol 8, Issue 1, Jan-Apr 2019

Figure 2 :
Figure 2: The Time Series of Subject Areas According to Google Trends.

Figure 3 :
Figure 3: The Depiction of the Twenty Training Procedure of the Algorithm in the Keyword Computer Science.

Figure 4 :
Figure 4: The Depiction of the Twenty Training Procedure of the Algorithm in the Keyword Engineering.

Table 6 :Table 8 :Figure 5 :
Figure 5: The depiction of the twenty training procedure of the algorithm in the keyword Mathematics.

Figure 6 :
Figure 6: The depiction of the Twenty Training Procedure of the Algorithm in the Keyword Social Sciences.

Figure 7 :
Figure 7: The Depiction of the Twenty Training Procedure of the Algorithm in the Keyword Medicine.

Figure 8 :
Figure 8: The Depiction of the Twenty Training Procedure of the Algorithm in the Keyword Decision Sciences.

Table 13 :
Weekly Appearance of Stationary Pics for Keyword (Energy) over the Past Five Years in all Categories-Worldwide Via Web Search.the top of the user's preference followed by Physics, Mathematics and Social Sciences, while the emerging sciences such as Material Science are in the last rank positions.

Figure 10 :
Figure 10: The Depiction of the Twenty Training Procedure of the Algorithm in the Keyword Biochemistry.

Figure 12 :
Figure 12: The Depiction of the Twenty Training Procedure of the Algorithm in the Keyword Physics.

Figure 13 :
Figure 13: The Depiction of the Twenty Training Procedure of the Algorithm in the Keyword Earth Science.

Figure 14 :
Figure 14: The Depiction of the Twenty Training Procedure of the Algorithm in the Keyword Energy.

Figure 9 :
Figure 9: The Depiction of the Twenty Training Procedure of the Algorithm in the Keyword Business.

Figure 11 :
Figure 11: The Depiction of the Twenty Training Procedure of the Algorithm in the Keyword Material Sciences.