Real Time Object Detection in Surveillance Cameras with Distance Estimation using Parallel Implementation

Real Time Object Detection in Surveillance Cameras with Distance Estimation using Parallel Implementation

Table of Contents


Object detection is not only shaping how Computers see and analyze things but it is also helping in the behavior of how an object reacts to the change in its environment. The main application of these object detection sensors or software is to find the location of an object in space or to track its movement. Object detection has infinitely many use cases and in this paper, we are introducing an application that will allow the safety of users struck in a disaster and who need to be evacuated. In such cases, the main thing to focus on and to eradicate is camera noise, saturation, and image compression. Our solution is to establish a connection between the person struck in a disaster with fire safety people. This works over a convolutional network that allows us to detect vulnerable things present inside a room that needs to be rescued and can also give an insight into any explosive inside the room. Our model uses Faster-RCNN and COCO which is a pretrained dataset. This allows real-time object detection and classification on our network. Using this we were able to detect an object or a person and get him to rescue by providing them the shortest way out of that place. With this, we were able to get an accuracy of more than 75% in our object detection model.

  • Author Keywords

    • Object Detection ,
    • RCNN ,
    • Convolutional Neural Networks ,
    • Distributed Computing ,
    • Parallel Processing


Object detection is a very powerful tool and it can be sometimes hard to implement and get the best results out of it. Object detection is done using two kinds of networks namely Neural Networks and Convolutional neural networks (CNN). The main difference between the two types of networks is the way they take the input for an image. Traditionally Neural Networks are built and trained over vectors that is they take a 2-dimensional view of an image whereas in CNN it takes a tensor as an input where it has height, depth, and width as its parameter. Moreover, we saw that in traditional methods an image used to be converted to grayscale and then used for computation. But when dealing with real-time detection on a video, color acts as a major classifying element. Hence CNN takes color as one of its parameters. The tensor is taken as input for CNN, the 3 parameters for this tensor are: a) image with H rows, b) W columns, c) and 3 channels (R, G, B channels). The input goes through a series of sequential steps or layers. There are many types of layers involved that process the images and find details from an image, few of the common layers are convolution layer, pooling layer, ReLU layer, and loss layer. There have been many advancements in object detection mainly which is driven by the success of the region proposal method. In this model, we have used Faster CNN which attains real-time rates using deep neural networks. It is also possible that in an image that there can be multiple objects of interest, Faster CNN allows real-time multiple-object detection. This means that there can be multiple people, animals, birds, and other objects of interest that can be detected at once. In object detection, all these objects are bounded by these rectangular frames which define the location and the object. The area of convergence of the object detection bounding to the real bounding is commonly known as Intersection over Union (IoU). Depending upon various parameters like feature extraction method, sliding window size, video quality, etc. Due to this, an IoU value over 0.5 is considered a good detection. This value changes depending upon the severity of the situation. This CNN takes advantage of GPU in computing these images. GPU has thousands of core that work in parallel as compared to a CPU which has a very limited number of cores. This increase in the number of cores increases the distributed computing and hence computation speed. The computation speed is so advanced that it takes 0.2s per image or even less.


Every life matters, and in case of disasters and attacks it becomes extremely crucial and a high priority to get the civilians into rescue. So, to do so they need a mode of communication, now since these civilians are trapped inside a building or a structure and in most of these cases, the telecommunication fails as well. So one of the way we can implement this communication is through sign languages which can be read and interpreted by our security cameras and tell the respective authority about the steps that need to be taken. They say safety doesn’t happen by accident. We need to have certain steps built up in order to take the right step at the right time. To ensure this safety and security our system comes in handy and can be plugged in with the existing CCTV camera or the surveillance cameras installed in the building. By the means of computer vision, we can create a safer world for all of us. This system can not only be used in case of disasters but also in case of any attack on a building or any kind of intrusion. It provides complete safety and ensures that the person gets to the rescue as soon as possible. So even if the security police are delayed they already have a rescue plan that they can use and hence get to safety. It also provides detection of any kind of object that doesn’t suit the living environment. For example, the system can be trained to detect any kind of explosive. With a detection accuracy of more than 75%, we can ensure that a person is always safe and sound and has lived in his control. Even in case of disaster, our system provides the things which a naked eye could miss and hence can cause a sense of panic in them.

Also by using parallel computation of distance in Dijkstra’s algorithm we ensure that computation is real-time and has no lag. This distance once calculated can be used to direct the person to safety. All these systems ensure that safety comes first and can prevent civilians from painful hospital trips.

FULL Paper PDF file:

Real Time Object Detection in Surveillance Cameras with Distance Estimation using Parallel Implementation



M. Thakur and S. Banu J.,




Real-Time Object Detection in Surveillance Cameras with Distance Estimation using Parallel Implementation

Publish in

2020 International Conference on Emerging Trends in Information Technology and Engineering (ic-ETITE), Vellore, India, 2020, pp. 1-6,



PDF reference and original file: Click here

+ posts

Somayeh Nosrati was born in 1982 in Tehran. She holds a Master's degree in artificial intelligence from Khatam University of Tehran.

Website | + posts

Professor Siavosh Kaviani was born in 1961 in Tehran. He had a professorship. He holds a Ph.D. in Software Engineering from the QL University of Software Development Methodology and an honorary Ph.D. from the University of Chelsea.

Website | + posts

Nasim Gazerani was born in 1983 in Arak. She holds a Master's degree in Software Engineering from UM University of Malaysia.