Unraveling the Unseen: Exploring Unsupervised Anomaly Detection Techniques

By Team Algo
Reading Time: 4 minutes

by Mihir Godbole

In today’s interconnected world, data is being generated at an unprecedented pace, permeating every aspect of our lives. From financial transactions and network communications to industrial operations and healthcare records, the sheer volume and complexity of data pose a significant challenge when it comes to identifying anomalies that may signal potential threats or opportunities. Traditional rule-based systems and supervised learning approaches fall short in tackling this task due to their reliance on labeled data and predefined rules. Unsupervised learning is applied when labeled data is not available. The objective of Anomaly Detection is to identify and isolate abnormal data points from the given set of data points. It also involves predicting if a new data point is anomalous or not. Since it is rarely the case where the anomalous data points fit within a defined category, this task is often treated as an unsupervised or semi-supervised task. 

Anomaly Detection Use Cases

The applications of anomaly detection can be in fraud detection (insurance, banking), intrusion detection (computer networks, national surveillance), medical informatics (diagnosis, disorder detection) and fault/damage detection (commerce, industry). The nature of the data influences the choice of algorithm used for anomaly detection. For example, in the case of time series data, anomalies can be identified as the outliers in the data. 

Anomaly Detection is also used to identify defects in manufactured products. Here, images of the products are used as the data. A combination of the characteristics of the above two use cases is observed in the case of Anomaly Detection in video surveillance. 

Most approaches try to analyze the data distribution to get a sense of “normal” data points. The approaches either find out the normal data points or isolate the anomalous ones.

Isolation Forest

Isolation Forest, as the name suggests, employs decision trees to isolate the anomalies. This is a very commonly used technique because of its speed of computation. This method is very effective for time series data. The data used must consist of features corresponding to the respective data sample. Therefore, even image data can be represented in this form and used for anomaly detection. Extracting key features from the image that give the most accurate representation of the image which can then be used to train the isolation forest model. (*This approach is used in our smart video surveillance system, Aksha.)

The following methods deal with image data. Their practical use case is to identify visual defects in objects. The objective is to match real-world visual industrial inspection of objects. The approaches can therefore be extended to automated part inspection solutions in manufacturing industries. 

SPADE: Sub-Image Anomaly Detection with Deep Pyramid Correspondences

Semantic Pyramid Anomaly Detection (SPADE) is a machine learning technique that works by building a hierarchical representation of an image called a “semantic pyramid” and using it to identify regions of an image that deviate significantly from the norm. To create the semantic pyramid, the image is first broken down into small patches. Each patch is then analyzed and assigned a label based on its semantic content, such as texture, color, or edge orientation. These labels are then used to construct a pyramid structure where the patches are grouped according to their semantic similarity. Once the pyramid is constructed, a machine learning algorithm is used to learn the normal patterns of the image data at each level of the pyramid.

PaDiM: a Patch Distribution Modeling Framework for Anomaly Detection and Localization

Patch Distribution Modeling, PaDiM,  concurrently detects and localizes anomalies in images in a one-class learning setting. PaDiM makes use of a pretrained convolutional neural network (CNN) for patch embedding, and of multivariate Gaussian distributions to get a probabilistic representation of the normal class. The framework is based on the assumption that normal image patches can be modeled by a probability distribution, and that anomalies can be detected as deviations from this normal distribution. Patch Distribution Modeling Framework is a powerful method for detecting and localizing anomalies in images, which can be applied in various fields where image analysis is important.

DiffusionAD: Denoising Diffusion for Anomaly Detection

This approach employs the recently popularized denoising diffusion model. The model is used to reconstruct a normal approximation of the input image. The segmentation sub-network is then used to localize the anomaly.  This is the best performing model on the dataset for defect detection on textured surfaces. The metrics of the above three methods on the DAGM [7] dataset are given below.

Conclusion

The importance of detecting anomalies lies in the potential impact of anomalies on business operations, security, and even human lives. From identifying fraudulent activities in financial transactions to detecting anomalies in medical data for early disease diagnosis, unsupervised anomaly detection holds the key to unlocking valuable insights and mitigating risks. Unsupervised anomaly detection has emerged as a powerful tool in the ever-expanding landscape of data analysis. By allowing machines to autonomously uncover hidden patterns and irregularities in data, this approach offers a unique advantage in identifying anomalies without the need for labeled examples or predefined rules. From bolstering cybersecurity measures to optimizing industrial processes and enabling early disease diagnosis, the ability to detect anomalies with unsupervised methods empowers organizations to proactively address risks, uncover hidden opportunities, and make informed decisions based on a deeper understanding of their data.

References:

[1]  https://www.datrics.ai/anomaly-detection-definition-best-practices-and-use-cases 

[2]  https://arpitbhayani.me/blogs/isolation-forest 

[3]  https://arxiv.org/pdf/2005.02357v3.pdf 

[4]  https://arxiv.org/pdf/2011.08785v1.pdf 

[5]  https://arxiv.org/pdf/2303.08730v2.pdf 

[6]  https://paperswithcode.com/task/unsupervised-anomaly-detection 

[7] DAGM dataset: https://paperswithcode.com/dataset/dagm2007