APPLICATION OF CLUSTERING TECHNIQUES TO STUDY THE TRAINING PATTERN PROVIDED BY THE DIFFERENT INSTITUTES UNDER HSRT

1. Assistant Professor in Management, Department of Commerce and Management, West Bengal State University, Barasat, W.B., India. 2. Faculty, University Institute of Technology, Burdwan, W.B., India. 3. Faculty, Department of Business Administration, Burdwan Raj College, Burdwan, W.B., India. ...................................................................................................................... Manuscript Info Abstract ......................... ........................................................................ Manuscript History Received: 10 April 2020 Final Accepted: 12 May 2020 Published: June 2020

912 categorization of clustering techniques like density based methods, model based methods and grid based methods, as suggested by Han et al.(2011). One non-hierarchical simple clustering technique is the K-mean clustering technique as proposed by MacQueen (1967). Hierarchical Clustering can further be subdivided into Agglomerative and Divisive Clustering. Both these methods can be further grouped into three categories: Single linkage clustering, complete linkage clustering and average linkage clustering. In this work we have employed Agglomerative single linkage clustering and K-mean Clustering.
Though clustering techniques were known form the early part of eighteenth century but the major applications of that were started in the second half of nineteenth century. Ward (1963) proposed Hierarchical grouping to optimize an objective function. In 1967, MacQueen suggested some methods for classification and Analysis of Multivariate Observations and King proposed Step-wise Clustering Procedures. After that Zahn (1971) tried Graph-theoretical methods for detecting and describing gestalt clusters. A Fuzzy Relative of the ISODATA Process and its use in Detecting Compact Well-Separated Clusters was addressed by Dunn in 1973. In the same year Sneath and Sokal focused on Numerical Taxonomy. Urquhart (1982) introduced Graph-theoretical clustering based on limited neighbourhood sets. A survey of recent advances in hierarchical clustering algorithms which use cluster centers was also done (Murtagh, 1984). Gath and Geva (1989) worked on optimal fuzzy clustering and Conceptual clustering, categorization and polymorphy were taken care by Hanson and Bauer (1989).
In 1992, Celeux and Govaert classified EM algorithm for clustering on the basis of two stochastic versions. Krishnapuram and Keller (1993) addressed a probabilistic approach to clustering. Wolpert and Macready (1997) dealt with the Theorem for Optimization. Next Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications was addressed by Agrawal et al.(1998). Nakayama and Kagaku (1998)  The Ministry of Tourism of the Government of India in 2009-10, launched a special initiative called "Hunar Se Rozgar Tak" (HSRT), for creation of employable skills amongst 8 th pass youths belonging to economically weaker strata of the Indian society. The programme is fully funded by the Ministry of Tourism, India. Ministry of Tourism, Government of India published a progress report of HSRT in 2016. The report consisted of several tables displaying the number of trainees who opted for each of four separate courses namely, Food Processing (FP), Food and Beverage (F&B), Bakery and Processing (B&P) and House Keeping Unit (HKU) in each of the different institutes situated in different parts of India. These tables containing the above mentioned data generated a curiosity within the authors of this paper to analyse, so that the underlying facts relating to the training patterns, training institutes, time effects etc may be revealed. As described in the subsequent sections, clustering techniques and some significance testing have been employed. To the best of our knowledge and information, this kind of application of clustering techniques to group several training institutes and subsequent analysis has not been done before.

Objectives:-
The main objectives of this work are as follows: 1. To study the similarity or differences between the training patterns provided by the different institutes under HSRT. 913 2. To judge whether time has played any role in changing the pattern of allocation of trainees to different courses of these institutes. 3. To find whether Geographical position of an institute plays any role in determining the relative frequencies of trainees opting for different types of training. 4. To determine whether any significant changes in the training infrastructure / condition of the different training institute under HSRT occur with the passage of time.

Methodology:-
Secondary data showing the conditions of different institutes (under HSRT) in terms of the number of trainees doing four different courses ( FP, F&B, B&P and HKU) has been collected from progress report for HSRT published by Ministry of Tourism, Government of India in 2016. With this above mentioned data an effort has been made to place the institutes into different clusters, based on the similarity of those institutes. We have used Agglomerative Hierarchical Clustering with single linkage. Agglomerative hierarchical methods start with individual objects. Thus there are initially as many clusters as objects. The most similar objects are first grouped and these initial groups are merged according to their similarities. Eventually as the similarity decreases all sub groups are fused into a single cluster. In single linkage clustering the link between two clusters is made by a single pair of elements, namely those two elements (one in each cluster) that are closest to each other. To determine the exact nature of membership in those N clusters, K-mean clustering has been done. MacQueen suggested the term K-mean for describing one of his algorithm that assigns each item to the cluster having the nearest centroids (means). In our work we have taken the value of K equal to N. These N clusters have been ordered on the basis of the numbers of members in each of them and they are named as C 1 , C 2 , ..… , C N , following the descending order of number of members in each of them. The same procedure is repeated for each of the four years from 2013 to 2016. Then to study whether the compositions of the clusters have undergone significant changes with the passage of time, comparisons have been made between the clustering compositions of the different years. To test whether the change (if any) is significant the help of chi square test has been employed.

Analysis and Findings
As already mentioned in the Introduction section, our dataset consist of data regarding the frequencies of trainees opting for each of the four courses for each of the institutes under HSRT. Initially, with this available data, Hierarchical Agglomerative Clustering of the institutes has been done with the help of SPSS 16 Num  From the dendrogram it is evident that if we permit the maximum distance between any two elements of the same cluster to be a reasonably considerable distance (say 4 units), then approximately 7 different clusters are noticed. To determine the exact nature of membership in those 7 clusters, K-mean clustering (taking K = 7) has been done, using SPSS 16.0. These 7 clusters have been ordered according to the number of members in each cluster (i.e., the cluster 915 having maximum number of members is named C 1 and the cluster having minimum number of members is named C 7 ). The same procedure has been followed for each year. The seven clusters obtained in each of the four years are presented [using the format: Cluster name/number (name of the members)(number of members in the cluster)] below.            From the clusters of each year visible above, it is noticed that most of the members remained in the same cluster with more or less same set of co-members over the years, which implies that either all these member institutes have had the same conditions including infrastructure all throughout this time period (2013-16) or else their conditions have jointly changed in the same manner, i.e., there has not been any special noticeable change for any of the specific individual institute so that it can be isolated from others. Of course at a glance, some characteristic features are observed like 1. FCI JAMMU has its own typical characterised identity which is completely different from any of the other institutes in consideration. This is demonstrated by the fact that it has always remained as a unique member of a single cluster in all the four years. One may further study the infrastructure and conditions of this institute to analyze its uniqueness. To conclusively judge whether the clustering of these institutes has significantly changed over the years, comparison between the clustering pattern of one year with another year, taking different pairs of years have been done using χ 2 test.

Comparison between 2013 and 2014
One may be interested to study whether the clustering of the institutes in 2013, in terms of the composition of the cluster has undergone a significant change in 2014. If there is no significant change then it would imply that with the passage of time, the member institutes have had the same conditions including infrastructure all throughout this time period 2013-14 or else their conditions have jointly changed in the same manner, i.e., there has not been any special noticeable change for any of the individual institute.  Table 5:-Transition frequency -2013-2014.
The Table 6 shows the transition or no transition of member from cluster of 2014 to cluster of 2015. For e.g., the figure 13 in the cell (1, 1) indicates 13 members of cluster C 1 of 2014 has remained to cluster C 1 of 2015, but 1 member has shifted to cluster C 2 (cell (1,2)) and another 1 member has shifted to cluster C 4 (cell (1,4)) of 2015.
Here, degrees of freedom, df = (7-1)(7-1)=36 Calculated χ 2 = 142.132 CLUSTER C 1 C 2 C 3 C 4 C 5 C 6 C 7 Total numbers of members C 1 18 Total numbers of members 24  8  6  5  3  3  1  50  Table 7:-Transition frequency -2015-2016. The Table 7 shows the transition or no transition of member from cluster of 2015 to cluster of 2016. With this data also χ 2 -test has been done and the following result is found . Calculated χ 2 = 158.557 For 36 degrees of freedom at 5% level of significance, tabulated χ 2 = 50.998 i.e. cal χ 2 >> tab χ 2 .  The Table 8 shows the transition or no transition of member from cluster of 2015 to cluster of 2016. With this data also χ 2 -test has been done and the following result is found . Calculated χ 2 = 96.022 For 36 degrees of freedom at 5% level of significance, tabulated χ 2 = 50.998 i.e. cal χ 2 >> tab χ 2 . So, H 0 is rejected. So there is a dependency between the cluster of 2013 and 2016. So passage of time does not have any significant impact on clustering in long term also. The relative nature of the training provided by the different institutes did not undergo any significant alteration in 2016 as compared to 2013.
Thus no particular institute can be singled out for showing any contrasting change in its training pattern offered over the years

Conclusion:-
This paper is the outcome of a simple effort to apply clustering techniques in comparing the training patterns (as well as the pattern of allocating different trainees to different courses) provided by the different training institutes under a Government project. To the best of our knowledge/information, this kind of study involving clustering of training institutes has not been done before. Moreover, the data set available (although fully authentic) had limitations due to the fact that it lacked variety and only data regarding frequencies of trainees opting for each of four different courses in each of the several institutes were available. However, analysis using clustering techniques and Chi-square test has revealed some interesting results. The study reveals that the training pattern has not undergone any significant change during the period 2013-2016. In Government training institutes the infrastructural condition does not change much with the passage of time. Some training institutes like FCI JAMMU and IHM SRINAGAR have unique characteristics. One may undergo further study on the infrastructure and other conditions of these institutes to analyze their uniqueness. In this regard it is also felt that the geographical positions of these two institutes have a crucial role to play behind this uniqueness.