Blog
4 min read

Detecting Small Objects from Images/Videos using AI

By Team Algo

Reading Time: 4 minutes

By Mugdha Thigle, Associate Data Scientist at AlgoAnalytics

Figure 1. Detecting “Small Objects”—A ship from the satellite image

Object detection is a technique related to computer vision and image processing that deals with detecting instances of semantic objects of a certain class (such as humans, buildings, or cars) in digital images and videos. Object detection, as of one the most fundamental and challenging problems in computer vision, has received great attention in recent years. Well-researched domains of object detection include face detection and pedestrian detection. Object detection has applications in many areas of computer vision, including image retrieval and video surveillance. It is also used in tracking objects, for example tracking a ball during a football match, tracking movement of a cricket bat, or tracking a person in a video.

While generic object detectors perform well on medium and large sized objects, they perform poorly for the overall task of recognition of small objects. Few examples of small objects would be ships as seen in satellite images (as shown in Fig. 1) or traffic signs seen from far away drone imaging. Small objects detection is a challenging task in computer vision due to its limited resolution and information. In this article we will explore Feature Pyramid Networks for small object detection and Super Resolution GANs for data augmentation and performance improvements.

Feature Pyramid Networks[1]

Figure 2. Feature Pyramid Network

Feature pyramids[2] are a basic component in recognition systems for detecting objects at different scales. But recent deep learning object detectors have avoided pyramid representations, in part because they are compute and memory intensive. A top-down architecture with lateral connections is developed for building high-level semantic feature maps at all scales. This architecture, as shown in Fig.2, called a Feature Pyramid Network (FPN), shows significant improvement as a generic feature extractor in several applications. When implemented on the airbus ship dataset[3], which is a collection of satellite images of ships in the ocean, a recall of 0.954 and mAP of 0.911 was achieved. Sample results as shown in Fig.3–4.

Figure 3, Small objects (ships) detected by FPN

Figure 4. Small objects (ships) detected by FPN

Super Resolution[4]

Super Resolution is the process of recovering a High Resolution (HR) image from a given Low Resolution (LR) image. An image may have a “lower resolution” due to a smaller spatial resolution (i.e. size) or due to a result of degradation (such as blurring). SR received substantial attention from within the computer vision research community and has a wide range of applications. As one of the main issues with small object detection is lack of appropriate picture clarity and resolution, it was thought that performing super resolution on the images might come in handy. For this, SRGAN[5] was used. During the training, A high-resolution image (HR) is downsampled to a low-resolution image (LR). A GAN generator upsamples LR images to super-resolution images (SR). We use a discriminator to distinguish the HR images and backpropagate the GAN loss to train the discriminator and the generator as shown in Fig.5. SRGAN uses a perceptual loss measuring the MSE of features extracted by a VGG-19 network. For a specific layer within VGG-19, we want their features to be matched (Minimum MSE for features).

Figure 5. Basic SRGAN architecture

However, with the airbus dataset, using super resolution showed no improvement in the performance. This is most likely because the image quality was not the issue for said dataset. The comparison table is shown in Fig.6.

Figure 6. Comparison Table

Small object detection is a challenging problem in computer vision. Showcased here is one of the many ways that we can continue working on it. Feature Pyramid Networks show significant improvement over more popular object detection methods such as YOLOv3 and thus show promise in the domain of small object detection. It has been widely applied in defense, military, transportation, industry, etc. It is extensively used for self driving cars in order to recognize street signs and pedestrians from a long way away and avoid accidents. Another major application is in the manufacturing industry, where detecting a small defect early on during assembly can save more money required for repairs or replacement than if the defect was found at a later stage in the assembly process. SRGAN may not have helped improve the performance for the airbus dataset, but it should not be dismissed when working on detecting small objects in lower quality images.

We, at AlgoAnalytics, have used innovative techniques for small object detection in satellite imaging using Feature Pyramid Networks and created a demo for the same.