Abstract
The auxiliary classifier can improve the performance of classification networks. However, the utility of the auxiliary detection head has not been explored in the object detection field. In this paper, we propose an auxiliary detection head to boost the performance of one-stage object detectors. Similar to other detection heads, the auxiliary detection head consists of a classification subnet and a regression subnet, which are essentially two convolution layers. Thus the auxiliary detection head is computationally efficient. Besides, the auxiliary detection head achieves implicit two-step cascaded regression. Specifically, the auxiliary detection head uses its output boxes as anchors for further regression. Within the auxiliary detection head, refinement of object localization corresponds to adjust the positions of its output boxes towards ground truth boxes, which helps the network learn more robust features. At inference, the auxiliary detection head can be removed without any adverse effect on the performance of the main detector head, which benefits from its independence and leads to two advantages: shrink the model size and shorten inference time. The proposed method is evaluated on Pascal VOC and COCO datasets. By incorporating the auxiliary detection head into a state-of-the-art object detector in parallel with the main detection head, we show consistent improvement over its performance on different benchmarks, whereas no extra parameters are introduced at inference time.
Author Keywords
- Auxiliary detection head,
- two-step cascaded regression,
- convolutional neural networks,
- object detection
IEEE Keywords
- Detectors,
- Object detection,
- Feature extraction,
- Head,
- Proposals,
- Convolution,
- Training
Introduction
Object detection has drawn a great deal of attention recently. It is a combination of classification and localization tasks. A detector can tell people the classes and locations of instances in an image. Object detection has a wide range of applications, including face detection [1], pedestrian detection [2], ship detection [3].
Recently, deep-learning-based object detectors are very popular and develop rapidly over a short period. Compared with the traditional methods, they can extract more robust features with the help of convolution neural networks (CNNs). The current state-of-the-art detectors can be divided into two types: two-stage detectors and one-stage detectors. The former firstly generates some region proposals, then classify and regress those proposals. Region proposals can be generated by an independent method [4] or a neural network embedded in the detector [5]. One-stage detectors eliminate proposal generation step and detect instances in a unified network. Thus, one-stage detectors have a great advantage in terms of computational efficiency. On the other hand, real-time processing is significant in some applications, which is closely linked with costs and efficiency. Therefore, this paper mainly explores one-stage detectors.
Though one-stage detectors can run at a relatively high speed, their performance tails two-stage detectors generally. After all, two-stage approaches can implement bounding box regression twice. Some methods have been proposed to improve the performance of one-stage detectors. [6]–[7][8] aggregated contextual information among different feature maps to generate accurate results. [9], [10] focused on training a one-stage object detector from scratch. [11], [12] proposed new loss functions for bounding box regression to improve the performance of object detection algorithms. One challenging problem for object detection is rough localization. To locate objects accurately, two-step cascaded regression was proposed in [13]. The output boxes of one detection head are transferred to another detection head for further refinement. To the best of our knowledge, there still lacks studies on the auxiliary detection head to enhance one-stage detectors.
This paper aims to boost the performance of one-stage detectors from a new prospect: auxiliary detection head. The utility of auxiliary classifier has been discussed in [14], [15]. It has been proved that auxiliary classifiers can help the network to reach a slightly higher plateau. In this paper, a simple yet effective module named auxiliary detection head (ADH) is proposed to improve the performance of one-stage detectors. Similar to other prediction heads in a detector, ADH consists of a classification subnet and a regression subnet, which are essentially two convolution layers. Thus the structure of ADH is simple and ADH is computationally efficient. Besides, ADH can achieve implicit two-step cascaded regression in one detection head. As mentioned above, the two-step cascaded regression in [13] relies on two branches. The anchors of one branch are initialized by the output boxes of another branch. But ADH is an independent module and uses its output boxes to initialize anchors for further refinement. Within the auxiliary detection head, refinement of object localization corresponds to adjust the positions of its output boxes towards ground truth boxes, which aids in feature extraction. Thus ADH is effective. The idea is depicted in the red dashed rectangle in Fig. 1 (b).
ADH is easy to use in one-stage detectors. As an auxiliary module, ADH can be plugged into the state-of-the-art object detection frameworks in parallel with the existing prediction branch. When ADH is added into a detector, there are two groups of detection heads: the original detection head and ADH. To distinguish them, the original detection head is named the main detection head, which is used for final prediction. During training, ADH is jointly optimized with the main detection head and the parameters increase a few. However, as an auxiliary module, ADH can be omitted after training because its output does not affect the main detection head. This benefits from the independence of ADH. Thus, the model size will not increase after the application of ADH during inference, which is profitable. Experiments on Pascal VOC and COCO datasets will demonstrate the effectiveness of ADH. After plugging into a one-stage detector, ADH can consistently improve its performance by non-negligible margins whereas no extra parameters are introduced at inference time. For example, the proposed method surpasses its baseline RefineDet by 1.8% and 1.1% AP with VGG-16 and ResNet-101 backbones respectively on COCO. The code will be made publicly available. The main contributions of this paper are summarized as follows.
- ADH is proposed to achieve implicit two-step cascade regression in one detection head.
- ADH can be plugged into the state-of-the-art object detection frameworks in parallel with the existing prediction branch and trained jointly with the original detection head.
- ADH can consistently improve the baseline by non-negligible margins whereas introducing no extra parameters at inference.
The remainder of this paper is organized as follows: In Section II, related works are discussed. The proposed method is introduced in detail in Section III. Experimental results and comparisons are presented in Section IV. Section V draws conclusions.
Conclusion
In this paper, ADH is proposed to improve the performance of current one-stage detectors. It is a simple module that only contains two convolution layers for classification and regression respectively. In addition, ADH achieves implicit two-step cascaded regression. Compared with two-step cascaded regression, ADH uses its output boxes as anchors. Within ADH, refinement of object localization corresponds to adjust the positions of its output boxes towards ground truth boxes, which helps feature extraction. ADH can be regarded as an enhanced module which builds upon the state-of-the-art object detection frameworks. We can plug ADH into a single-shot detector in parallel with the original detection head. In the training stage, ADH is jointly optimized with the main detection head. At the time of inference, ADH is removed without any adverse effect on the performance of the main detection head. Thus ADH plays a role as an auxiliary module. This benefits from the independence of ADH and has two advantages: shrink the model size and shorten inference time. Experiments are carried out on Pascal VOC and COCO datasets to demonstrate the effectiveness of ADH. By incorporating ADH into a one-stage detector, we show consistent improvement to its performance without introducing any parameters at inference time.
In the future, we plan to introduce ADH into two-stage detectors. We believe that ADH is beneficial to the research of object detection.
About KSRA
The Kavian Scientific Research Association (KSRA) is a non-profit research organization to provide research / educational services in December 2013. The members of the community had formed a virtual group on the Viber social network. The core of the Kavian Scientific Association was formed with these members as founders. These individuals, led by Professor Siavosh Kaviani, decided to launch a scientific / research association with an emphasis on education.
KSRA research association, as a non-profit research firm, is committed to providing research services in the field of knowledge. The main beneficiaries of this association are public or private knowledge-based companies, students, researchers, researchers, professors, universities, and industrial and semi-industrial centers around the world.
Our main services Based on Education for all Spectrum people in the world. We want to make an integration between researches and educations. We believe education is the main right of Human beings. So our services should be concentrated on inclusive education.
The KSRA team partners with local under-served communities around the world to improve the access to and quality of knowledge based on education, amplify and augment learning programs where they exist, and create new opportunities for e-learning where traditional education systems are lacking or non-existent.
FULL Paper PDF file:
Auxiliary Detection Head for One-Stage Object DetectionBibliography
author
Year
2020
Title
Auxiliary Detection Head for One-Stage Object Detection,
Publish in
in IEEE Access, vol. 8, pp. 85740-85749, 2020
Doi
10.1109/ACCESS.2020.2992532
PDF reference and original file: Click here
Somayeh Nosrati was born in 1982 in Tehran. She holds a Master's degree in artificial intelligence from Khatam University of Tehran.
-
Somayeh Nosratihttps://ksra.eu/author/somayeh/
-
Somayeh Nosratihttps://ksra.eu/author/somayeh/
-
Somayeh Nosratihttps://ksra.eu/author/somayeh/
-
Somayeh Nosratihttps://ksra.eu/author/somayeh/
Professor Siavosh Kaviani was born in 1961 in Tehran. He had a professorship. He holds a Ph.D. in Software Engineering from the QL University of Software Development Methodology and an honorary Ph.D. from the University of Chelsea.
-
siavosh kavianihttps://ksra.eu/author/ksadmin/
-
siavosh kavianihttps://ksra.eu/author/ksadmin/
-
siavosh kavianihttps://ksra.eu/author/ksadmin/
-
siavosh kavianihttps://ksra.eu/author/ksadmin/
Nasim Gazerani was born in 1983 in Arak. She holds a Master's degree in Software Engineering from UM University of Malaysia.
-
Nasim Gazeranihttps://ksra.eu/author/nasim/
-
Nasim Gazeranihttps://ksra.eu/author/nasim/
-
Nasim Gazeranihttps://ksra.eu/author/nasim/
-
Nasim Gazeranihttps://ksra.eu/author/nasim/