Abstract
The newly proposed weighted k nearest neighbour is known as standard deviation K nearest neighbour(SDKNN) classifier technique. It is based on the principle of standard deviation. Standard deviation measures spreading of attribute about mean. Spreading of attribute plays a significant role to improve the classification accuracy of a dataset. Most of our distance calculation method between two points is determined by using euclidean distance process for finding nearest neighbour. Our proposed technique is based on a new distance calculation formula to find nearest neighbour in KNN. We apply here standard deviations of attributes as power for calculating distance between train dataset and test dataset. Distance calculation between two points in k nearest neighbour classifier is modified according to the standard deviation of attribute. In this paper, standard deviation of attributes are used. In first attempt, we have used standard deviation of attributes as power for calculating K Nearest Neighbour to improve classification accuracy and in second attempt, based on mean of standard deviation attributes, distance in K Nearest Neighbour is processed to further improve the classification accuracy. Our concept is implemented on Pima Indian Diabetes Dataset (PIDD). The analysis on Pima Indian Diabetes Dataset (PIDD) is carried out by splitting dataset in to 90% training data and 10% testing data. We have found that, in our proposed technique, average classification accuracy gives result 83.2%, a great improvement as compared to other conventional technique.
Export citation and abstract BibTeX RIS
Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.