Gesture Perception¶

the video above shows the demonstration of the gesture perception node which was developed to perceive the start command and pointed posture

Overview¶

The purpose of gesture perceptual application for sensing human behavior, living with the human the robot must know human behavior and interact with humans. The posture can specify many things whether emotion, desire, or action so gesture perception is necessary for living with humans. And the minor purpose is to command the robot as we say the posture can specify the desire. The basic way to command the robot is the posture and we can adjust the special command following the desire of the developer however it depends on the suitability to use in real life

ros2 system architecture¶

The gesture detection node consists of 2 services for calling from the behavior root node while the detect sequence is occurs

Installation¶

CV bridge LTS on ubuntu

sudo apt-get install ros-(ROS version name)-cv-bridge
sudo apt-get install ros-(ROS version name)-vision-opencv

Tensorflow==2.9.1

see the installation reference on this link: https://www.tensorflow.org/install

Example¶

First clone the repository from GitHub following this command

git clone https://github.com/MBSE-2022-1/Software-Team.git -b gesture-perception

Build the package (navigate to workspace directory before build)
colcon build --symlink-install
Note

Always navigate to workspace directory before build and symlink-install is necessary

Run the package

ros2 run gesture_detection gesturedetection.py

Open realsense camera

ros2 launch realsense2_camera rs_launch.py \
            rgb_camera.profile:=640x480x30 \
            depth_module.profile:=640x480x30 \
            pointcloud.enable:=true

Call service

ros2 service call <service name> std_srvs/srv/Empty

API Reference¶

The gesture detection function consist of image subscriber, extract image feature, preprocessing and classifier model

Mediapipe

the image feature is extracted by mediapipe library Mediapipe hand landmarks are composed of x, y, and z. x and y are normalized to [0.0, 1.0] by the image width and height respectively. z represents the landmark depth with the depth at the wrist being the origin, and the smaller the value the closer the landmark is to the camera. The magnitude of z uses roughly the same scale as x. The preprocessing function will set the wrist position as the origin point and then subtract the other 20 points from the origin point then normalize the position

See the reference API here: https://google.github.io/mediapipe/solutions/hands.html#python-solution-api
preprocessing function

.. calc_landmark_list(self, landmarks)::¶

The hand landmarks from mediapipe are normalized [0.0, 1.0] this function will convert the normalized value to the picture position

Parameters:

landmarks: normalize hand landmarks from the result of mediapipe

Return:

The same size of the input array respective to the image size with format [x, y, z]

.. pre_process_landmark(self, landmark_list)::¶

This function using for preprocessing the hand landmark by subtracting all hand keypoint with the wrist position value and chaining the position x, y and z together

Parameters:

landmark_list: list of hand landmarks respective to the image size in the format [x, y, z]

Return:

a dimension list of scale hand landmarks x follow by y and z position

.. calc_bounding_rect(self, landmarks)::¶

this function calculates the landmarks from mediapipe for the bounding box for debugging with the image

Parameters:

landmarks: normalize hand landmarks from the result of mediapipe

Return:

[x, y, x + w, y + h] respectively to the image size
Classifier model architecture

Input: 42 length arrays

Output: hand class [‘Open’, ‘Start_cmd’, ‘Pointer’, ‘Close’, ‘OK’]

Subsystem Verification¶

Detection range¶

On the first version of the robot, the camera has 160 cm height from the floor which means the maximum detect range should be more than 230 cm from our calculation with Realsense d455 extrinsic matrix

test condition
- this testing result using estimate distance with 2.5D anchors
- min confidence and tracking of mediapipe is 0.5
result

Processing time testing¶

test condition
- Running with CPU Intel NUC on the robot
result
- Each node using 300 MB memory
- Each node uses 10% CPU when processing with a max frequency

Problem and future plan¶

Problem¶

The classifier model has low performance
Detecting the hand only in front of the robot
If there are many hands in the camera plane there is no indicator to detect