nicetoolbox.detectors.method_detectors.body_joints.mmpose_framework.MMPose¶

class nicetoolbox.detectors.method_detectors.body_joints.mmpose_framework.MMPose(config, io, data)[source]¶

Bases: BaseDetector

The MMPose class is a method detector for pose estimation using the MMPose framework.

This class is designed to perform pose estimation on video data using the MMPose framework, which is an open-source toolbox for comprehensive pose estimation. It provides the necessary preparations and post-inference utilities to integrate the MMPose framework into our pipeline.

The MMPose class supports 2D pose estimation and can handle data from multiple cameras. It also has the capability to perform 3D pose estimation by triangulating the 2D pose data from different camera views. Additionally, it offers the option to filter the 3D pose data to reduce noise and smooth out the results.

Available components: body_joints, hand_joints, face_landmarks

camera_names¶

Names of the cameras used to capture the video data.

Type:: list of str

video_start¶

The frame number from which to start pose estimation.

Type:: int

filtered¶

Indicates whether the 3D pose data should be filtered.

Type:: bool

filter_window_length¶

Specifies the length of the window used for filtering the 3D data. filter_polyorder (int): Defines the order of the polynomial used in the filtering process.

Type:: int

data_folder¶

The file path to the folder containing the input data.

Type:: str

out_folder¶

The file path to the folder where the pose estimation results will be saved.

Type:: str

prediction_folders (dict of str: str): A dictionary mapping camera names to their respective folders for storing the pose estimation predictions.

image_folders (dict of str: str): A dictionary mapping camera names to their respective folders containing the input images or frames for pose estimation.

result_folders (dict of str: str): A dictionary mapping camera names to their respective folders for storing the final pose estimation results.

calibration¶

Contains the calibration parameters for the cameras used in pose estimation.

Type:: dict

Initializes the MMPose class with configuration settings and IO handling capabilities.

This constructor takes care of all inference preparation steps, including setting up the input and output folders, configuring the keypoint mapping, and preparing the calibration parameters for 3D pose estimation.

Parameters:

config (dict) – A dictionary containing the configuration settings for the method detector.
io (class) – An instance of the IO class for input-output operations.
data (class) – An instance of the Data class for accessing data.

Methods

`get_image_folders`	Generate and return a dictionary of image folders for each camera.
`get_per_component_keypoint_mapping`	This method extracts the keypoint indices and descriptions for each pose estimation component.
`get_prediction_folders`	Generate and return a dictionary of prediction folders for each camera.
`post_inference`	Post-inference processing for pose estimation components such as body_joints, hand_joints, and face_landmarks.
`run_inference`	Runs the inference of the method detector in a separate terminal/cmd window using the specified virtual environment or conda environment.
`visualization`	Generates a visualization video for each camera from the processed image frames.

Attributes

`algorithm`	Abstract property that returns the algorithm of the method detector.
`components`	Abstract property that returns the components of the method detector.

abstract property algorithm: str¶

Abstract property that returns the algorithm of the method detector.

This property should be implemented in the derived classes to specify the algorithm that the method detector is associated with.

Returns:: A string representing the algorithm associated with the method detector
Return type:: str
Raises:: NotImplementedError – If the property is not set in the derived classes.

abstract property components: list[str]¶

Abstract property that returns the components of the method detector.

This property should be implemented in the derived classes to specify the components that the method detector is associated with.

Returns:

A list of strings representing the components associated with the: method detector.

Return type:

list

Raises:

NotImplementedError – If the property is not set in the derived classes.

get_image_folders(make_dirs=False)[source]¶

Generate and return a dictionary of image folders for each camera.

Parameters:

make_dirs (bool) – If True, the function will create the directories if they do not exist.

Returns:

A dictionary where the keys are the camera names and the values are: the corresponding image folder paths.

Return type:

dict

abstract get_per_component_keypoint_mapping(keypoints_indices)[source]¶

This method extracts the keypoint indices and descriptions for each pose estimation component.

It has to be implemented by the derived classes associated to the available pose estimation algorithms. (See HRNetw48 and Vitpose classes below)

Available algorithms are: hrnetw48, vitpose

Parameters:: keypoints_indices (_type_) – _description_

get_prediction_folders(make_dirs=False)[source]¶

Generate and return a dictionary of prediction folders for each camera.

Parameters:

make_dirs (bool) – If True, the function will create the directories if they do not exist.

Returns:

A dictionary where the keys are the camera names and the values are: the corresponding prediction folder paths.

Return type:

dict

post_inference()[source]¶

Post-inference processing for pose estimation components such as body_joints, hand_joints, and face_landmarks.

This method takes the raw 2D pose estimation results and applies a series of processing steps. They include optional filtering to smooth the results, interpolation to fill in missing values, undistortion using camera calibration parameters, and 3D triangulation from multiple camera views. The final processed results are saved for further analysis and visualization for each of the components.

Steps: 1. Filtering: Applies a smoothing filter to the 2D pose estimation results

if filtering is enabled. This step reduces noise and improves the consistency of the pose data over time.

Interpolation: Fills in missing values in the 2D pose estimation results.
This is crucial for maintaining the integrity of the pose data, especially in cases where occlusion or poor lighting conditions may lead to incomplete detections.
Undistortion: Corrects the 2D pose estimation results for lens distortion
using the camera’s calibration parameters.
3D Triangulation: Uses the undistorted 2D pose estimation results from
at least two camera views to reconstruct the 3D positions of the pose keypoints.
Saving Results: The processed 3D pose data is saved to a .npz file with
the following structure:
‘2d’: A numpy array containing the 2D pose estimation results.

‘2d_filtered’: A numpy array containing the filtered 2D pose
estimation results.

‘2d_interpolated’: A numpy array containing the interpolated 2D pose
estimation results.

‘bbox_2d’: A numpy array containing the 2D bounding box coordinates.

‘3d’: A numpy array containing the 3D pose estimation results.

‘data_description’: A dictionary containing the data description for
the above output numpy arrays. See the documentation of the output for more details.

Returns:: None. The processed results are saved to the output folder (See step 5).

run_inference() → None¶: Runs the inference of the method detector in a separate terminal/cmd window using the specified virtual environment or conda environment. Captures the output and logs the success or failure of the inference.

visualization(data)[source]¶

Generates a visualization video for each camera from the processed image frames.

This method takes the processed image frames for each camera and compiles them into a video file. It uses the frames_to_video() function from utils.video.py. The success of the video creation is tracked, and the method logs the outcome of the visualization process for each camera. :param data: An instance of a class that stores all data related

information, including the frame rate (fps) for the video and the starting frame number (video_start).

Note

The method assumes that the processed image frames are named in a

specific format (%09d.jpg), where each frame’s name is a zero-padded five-digit number representing its sequence in the video.