Environment Recognition Using Robot Camera

Environment Recognition Using Robot Camera

Table of Contents


The purpose of this research is to create an environmental recognition system, specifically on emotion expressions and human states, for a humanoid robot aiming for interpersonal services. Region Convolutional Neural Networks (R-CNN) are often used for detecting objects in the environment. We employ a Mask R-CNN model for the detection of emotions and states of a target person from the robot’s field of view. The model was trained using various images of a human’s body in several emotional states. Experiments were conducted to validate the effectiveness of the model to detect the states of surrounding people from the robot’s camera. Although the set of human states assumed in the experiment was limited, the results of the experiments imply the potential of the proposed method to act as a basis of a recognition model for an intelligent humanoid robot for interpersonal services.

Author Keywords

  • Mask R-CNN,
  • human state recognition,
  • image processing,
  • humanoid robot

IEEE Keywords

  • Humanoid robots,
  • Cameras,
  • Robot vision systems,
  • Target recognition,
  • Service robots,
  • Emotion recognition


In recent years, mechanical engineering technology in intelligent robots has been greatly developed among many important fields. This has made it possible for making intelligent robots to serve in people’s lives, with reliably designed software [1][2][3]. These robots are called service robots, which have attracted a great deal of attention in the research and industry fields. The service robots are distinctly different from traditional industrial robots which are developed mainly for speed and accuracy of operation, to reduce factory personnel, and to improve production rate. Indeed, they are designed to perform work closely related to human life, such as entertainment and work substitution, or perform collaborative work with humans. These service robots do not only need functions such as speed and accuracy of movement but also need to reproduce the appearance and characteristics which are close to humans so that humans do not feel uncomfortable. Therefore, the ability of humanoid robots to interact or communicate with people is considered of high importance for doing these collaborative works.

In this study, we focus on one of the most important technologies that allow humanoid robots to recognize the communication behavior of a conversation target person [4][5][6]. Like humans in communication, the robot is required to continuously estimate the other party’s condition, such as smiling or not, and different body gestures, in order to make proper reactions. Besides the content of the conversation, these reactions could also greatly influence the quality of an interpersonal service. we adapt the Mask RCNN detector based on the learning data created from a large number of human images. Using the data, we analyze images in real-time while acquiring images from a camera-equipped on the humanoid robot and conduct an experiment to estimate the human state in the video.

The rest of the paper is composed as follows. In Section 2, related works are introduced to present the direction of this research. In Section 3, the proposed method is described. In Section 4, the results of the experiments are presented with future issues. Conclusions are presented in Section 5.


There is a wide variety of researches on object recognition based on image processing in conventional studies. Such studies often aim for recognizing a variety of targets, such as people and objects. Service robots equipped with these models are expected to play important roles in many applications, such as video surveillance and monitoring. Many methods have been proposed for object recognition such as image processing [7][8], color and distance sensor [9][10], and usage of information such as skeleton and shape [11][12]. Although the methods have shown highly reliable results in recognizing the target, these methods require special equipment, as well as a large amount of data to be input manually, which caused difficulty in dealing with unknown data. In this research, we propose a method that has high scalability and can flexibly deal with a large amount of data by combining image processing technology for video analysis with machine learning technology.


In this study, we proposed an environmental recognition method that recognizes the emotion and state of the target person in an interpersonal conversation environment using the video acquired by the humanoid robot. Specifically, the proposed method analyzes the video acquired from a camera attached to the eyeball of a humanoid robot and recognizes the facial expression and posture of the person in the field of view. The proposed model obtains head and body regions from the target human image based on the Mask R-CNN and classifies them based on the contents of these obtained regions. From the experimental results, it can be seen that in a conversation environment, the humanoid robot can correctly analyze the surrounding people and other environments, classify the emotions and states of the subject in conversation with high accuracy. For future work, we plan to improve the performance of the model, in addition, to evaluate the method in complex environments for practical application.

About KSRA

The Kavian Scientific Research Association (KSRA) is a non-profit research organization to provide research / educational services in December 2013. The members of the community had formed a virtual group on the Viber social network. The core of the Kavian Scientific Association was formed with these members as founders. These individuals, led by Professor Siavosh Kaviani, decided to launch a scientific / research association with an emphasis on education.

KSRA research association, as a non-profit research firm, is committed to providing research services in the field of knowledge. The main beneficiaries of this association are public or private knowledge-based companies, students, researchers, researchers, professors, universities, and industrial and semi-industrial centers around the world.

Our main services Based on Education for all Spectrum people in the world. We want to make an integration between researches and educations. We believe education is the main right of Human beings. So our services should be concentrated on inclusive education.

The KSRA team partners with local under-served communities around the world to improve the access to and quality of knowledge based on education, amplify and augment learning programs where they exist, and create new opportunities for e-learning where traditional education systems are lacking or non-existent.

FULL Paper PDF file:

Environment Recognition Using Robot Camera



H. Sumida, F. Ren, S. Nishide and X. Kang,




Environment Recognition Using Robot Camera,

Publish in

2020 5th IEEE International Conference on Big Data Analytics (ICBDA), Xiamen, China, 2020, pp. 282-286,



PDF reference and original file: Click here 



+ posts

Somayeh Nosrati was born in 1982 in Tehran. She holds a Master's degree in artificial intelligence from Khatam University of Tehran.

Website | + posts

Professor Siavosh Kaviani was born in 1961 in Tehran. He had a professorship. He holds a Ph.D. in Software Engineering from the QL University of Software Development Methodology and an honorary Ph.D. from the University of Chelsea.

Website | + posts

Nasim Gazerani was born in 1983 in Arak. She holds a Master's degree in Software Engineering from UM University of Malaysia.