Abstract
When deep learning models classify the actual data and outlier data, most of them do not have enough information about the outlier, which might cause misclassification. Therefore, there is a need for an efficient way to analyze outlier data through visualization. We propose a visualization method combining LBP, LLE, and SMOTE for outlier data detection. Furthermore, we introduce a new confusion visualization method that uses the similarity of pixel density distributions. We also present a new histogram visualization method using the frequency of pixel position’s distribution in EDA. To validate its effectiveness, we compared the proposed method with UMAP and LLE. In evaluation, the outlier data is generated by three types of GAN (Vanilla GAN, DRAGAN, and EBGAN). The results of the proposed visualization method show its usefulness for outlier detection.
-
Author Keywords
- Combined LBP LLE with Smote visualization ,
- Confusion visualization ,
- Histogram visualization ,
- Outlier data
-
IEEE Keywords
- Data visualization ,
- Data models ,
- Gallium nitride ,
- Manifolds ,
- Machine learning ,
- Histograms ,
- Anomaly detection
Introduction
It is essential to detect outlier detection inside the deep learning model. When the deep learning model detects outlier data, much variation occurs in the model’s parameters because the model’s parameters tend not to have or contain information about the outlier data. This transformation of the model with numerous parameter information results in many misclassifications as the model infers new data. As a result, it is essential to visualize and classify these outlier data in advance so that the model’s parameters do not vary much. In this paper, we studied a new visualization method to detect outliers efficiently. Contributions in this paper are as follows. We propose a visualization technique that combined LBP LLE with Smote for outlier data detection. Second, we propose a confusion visualization method using similarity of pixel density distribution. Third, we propose a histogram visualization using frequency of position of pixel distribution in EDA (Exploratory Data Analysis). Research has been done to visualize the distribution of data in deep learning. Typically, five methods are known to people. The first known method is, MDS (Multi-Dimensional Scaling) [1] is a means of visualizing the level of similarity of individual cases in a data set. It used to organize information about pair-wise distances among N sets of objects or individuals into n points mapped into abstract orthogonal space. Secondly, LLE (Local Linear Embedding) [2] is a method of focusing on the information locality of the nearest neighbor. LLE [2] keeps the only locality and learns the embedding space. However, preserving neighboring information can preserve the global structure to some extent because it mapped to the embedding space through the relationship of linear coupling. Thirdly, ISOMAP [3] creates a Nearest neighbor graph, then calculates the distance between the manifolds with the shortest path in the graph and maps it to the embedding space. By using ISOMAP method can generate a two-dimensional plane that preserves information between neighbors in the manifold. Fourthly, tSNE [4] is shrinking the dimension, preserving locality, so that the points located close to the higher dimension circle space x located close to the lower dimension embedding space. We can use the similarity information between points to learn how to cluster efficiently in the embedding space. Fifthly, UMAP [5] assumes that the data evenly distributed in the Lehman manifold. Since the Lehmann measurement is locally constant, the distance calculated using the Lehmann measurement, which is calculated locally at the part where the manifold is locally connected. Probabilistic optimization of the complex combination components forms the topology. The result is the construction of topology spaces that reduce the complexity of dealing with continuous geometry. MDS [1], multidimensional scaling is a means of visualizing the similarity level of individual cases in a data set. It used to organize information about pair-wise distances among N sets of objects or individuals into n points mapped into abstract orthogonal space. Secondly, LLE [2] is a way of focusing on the information locality of the nearest neighbor. LLE [2] keeps the only locality and learns the embedding space. However, preserving neighboring information can preserve the global structure to some extent because it mapped to the embedding space through the relationship of linear coupling. Thirdly, ISOMAP [3] creates a Nearest neighbor graph, then calculates the distance between the manifolds with the shortest path in the graph and maps them to the embedding space. As a result, we can learn a twodimensional plane that preserves information between neighbors in the manifold. Fourthly, t-SNE [4] shrinks the dimension, preserving locality so that the points located near the higher dimension space x are closer together in the lower dimension embedding space. We can use the similarity information between points to learn how to cluster efficiently in the embedding space. Fifthly, UMAP [5] assumes that the data uniformly distributed in the Lehman manifold. Since the Lehmann measurement is locally constant, the distance calculated using the Lehmann measurement, which is calculated locally at the part where the manifold is locally connected. Probabilistic optimization of the complex combination components forms the topology that is the construction of topology spaces that reduce the complexity of dealing with continuous geometry.
The difference in pixel value density compares outlier data with that of existing data, allowing for more precise comparisons. Fig 1.d shows that histogram visualization using the frequency of position of pixel distribution in EDA compares the similarity of the location of pixel distribution. Each technique has been described in detail. Fig 2. shows four methods for experimental verification of the proposed method. Fig 2.a) is the existing method. Fig 2.b) is applied to LLE and SMOTE [7]. This is to observe the effect of correction when the amount of information in the feature map is unbalanced through linear sampling. Fig 2.c) is the method of applying Sampling and Linear Combination after applying LBP (Local Binary Pattern) [6]. This is to see the effect of linearly correcting the imbalance of the sampling result when converting to a binary feature before sampling. Firstly, combined LBP LLE SMOTE is a method to efficiently show the characteristics of the relations between data through information imbalance and binary processing. Secondly, we propose a method of visualizing the characteristics of data using the frequency of pixels. The number of pixels of a pixel held by r, g, and b in the existing image is available. The frequency of these pixels is more robust to the two types mentioned above because the frequency of occurrence of the pixel is checked regardless of the change in value or the change of position.
Conclusion
We proposed three visualization techniques. Combined LBP LLE SMOTE the advantages of generalized data visualization by correcting unbalanced data characteristics through sparse data correction. Through this, we can check that it is efficient in fake image discrimination. Also, the Visualization of pixel density similarity has an advantage of efficiently detecting when pixel position information is wrongly generated through similarity analysis of pixel positions and showing correlation information of pixel information. Visualization of pixel density frequency shows the advantage of extracting fake pixels through the frequency of pixel values generated through density distribution analysis. The method proposed in this paper can be used as a method for the detection of real-world images. Through this, we can confirm that the visualization method that analyzes outlier data should be studied to discriminate data similar to actual data. [10].
About KSRA
The Kavian Scientific Research Association (KSRA) is a non-profit research organization to provide research / educational services in December 2013. The members of the community had formed a virtual group on the Viber social network. The core of the Kavian Scientific Association was formed with these members as founders. These individuals, led by Professor Siavosh Kaviani, decided to launch a scientific / research association with an emphasis on education.
KSRA research association, as a non-profit research firm, is committed to providing research services in the field of knowledge. The main beneficiaries of this association are public or private knowledge-based companies, students, researchers, researchers, professors, universities, and industrial and semi-industrial centers around the world.
Our main services Based on Education for all Spectrum people in the world. We want to make an integration between researches and educations. We believe education is the main right of Human beings. So our services should be concentrated on inclusive education.
The KSRA team partners with local under-served communities around the world to improve the access to and quality of knowledge based on education, amplify and augment learning programs where they exist, and create new opportunities for e-learning where traditional education systems are lacking or non-existent.
FULL Paper PDF file:
Visualization Techniques for Outlier DataBibliography
author
Year
2020
Title
Visualization Techniques for Outlier Data
Publish in
2020 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), Fukuoka, Japan, 2020, pp. 346-351,
Doi
10.1109/ICAIIC48513.2020.9065228.
PDF reference and original file: Click here
Somayeh Nosrati was born in 1982 in Tehran. She holds a Master's degree in artificial intelligence from Khatam University of Tehran.
-
Somayeh Nosratihttps://ksra.eu/author/somayeh/
-
Somayeh Nosratihttps://ksra.eu/author/somayeh/
-
Somayeh Nosratihttps://ksra.eu/author/somayeh/
-
Somayeh Nosratihttps://ksra.eu/author/somayeh/
Professor Siavosh Kaviani was born in 1961 in Tehran. He had a professorship. He holds a Ph.D. in Software Engineering from the QL University of Software Development Methodology and an honorary Ph.D. from the University of Chelsea.
-
siavosh kavianihttps://ksra.eu/author/ksadmin/
-
siavosh kavianihttps://ksra.eu/author/ksadmin/
-
siavosh kavianihttps://ksra.eu/author/ksadmin/
-
siavosh kavianihttps://ksra.eu/author/ksadmin/
Nasim Gazerani was born in 1983 in Arak. She holds a Master's degree in Software Engineering from UM University of Malaysia.
-
Nasim Gazeranihttps://ksra.eu/author/nasim/
-
Nasim Gazeranihttps://ksra.eu/author/nasim/
-
Nasim Gazeranihttps://ksra.eu/author/nasim/
-
Nasim Gazeranihttps://ksra.eu/author/nasim/