Data-Defect Inspection With Kernel-Neighbor-Density-Change Outlier Factor

Abstract

Data-defect would affect the data quality and the analysis results of data mining. This paper presents a data-defect inspection method with kernel-neighbor-density-change outlier factor (KNDCOF). The definition of kernel neighbor density is proposed to represent the density of each object in database, and the ascending distance series (ADS) of each object is calculated based on the kernel distance between the object and its neighbors. Then, the average density fluctuation (ADF) of the object is established according to the weighted sum of the square of density difference between the object and others in ADS. Finally, the KNDCOF of the object is equal to the ratios of the ADF of the object and the average ADF of neighbors of the object. The degree of the object being an outlier is indicated by the KNDCOF value. The experiments are performed on three real data sets to evaluate the effectiveness of the proposed method. The experimental results verify that the proposed method has higher quality of data-defect inspection and does not increase the time complexity.

More information

Full text and more information