Outlier Detection in Imbalanced Data Classification

M. Kamaladevi,K. R. Sekar, V. Venkataraman, K. Kannan

semanticscholar(2019)

引用 0|浏览11
暂无评分
摘要
In Binary classification , the distribution of classes present in a data is not uniform such that the number of instances of a class(es) significantly out numbers the instances of another class(es) leads to class imbalance. Classification algorithm biased toward the majority class. Performance accuracy are not based on minority class instance. This lead to degrade the classifier .To improve performance characteristics of minority data instance such as borderline rare and outlier has to analyzed. An outlier or an anomaly is a point that deviates from the normal behavior exhibited by the other points in a data. Detection of outlier in class instances is still open Research. Problem. In this article, two density-based outlier detection methods are compared. The two methods in discussion are the KNN method and the Local Outlier Factor (LOF). The KNN algorithm, which is a classification algorithm, is a global densitybased method, while the LOF is a local density-based method. These two methods are applied on the imbalanced data set Breast Cancer-W Dataset, consisting of 569 instances and 33 variables, taken from the UCI (University of California, Irvine) Machine Learning repository. The accuracy of both the algorithms (based on the percentage of observations correctly identified) is found out and their performances are analyzed. It has been found out that LOF method provided a better view of outlier data compared to KNN method.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要