nicetoolbox.detectors.method_detectors.body_joints.mmpose_framework

Integration of the MMPose framework into the NICE toolbox pipeline.

Functions

extract_key_per_value

Extracts keys from a dictionary based on the type of their values.

Classes

HRNetw48

HRNetw48 is a subclass of MMPose specialized for pose estimation using the HRNetw48 model.

MMPose

The MMPose class is a method detector for pose estimation using the MMPose framework.

VitPose

VitPose is a subclass of MMPose specialized for pose estimation using the Vision Transformer (ViT) model.

class nicetoolbox.detectors.method_detectors.body_joints.mmpose_framework.HRNetw48(config, io, data)[source]

HRNetw48 is a subclass of MMPose specialized for pose estimation using the HRNetw48 model.

The HRNetw48 class is designed to utilize the HRNetw48 model within the MMPose framework for pose estimation tasks. It provides the necessary component keypoint indices and descriptions for the body joints component. The base_detector class starts the algorithm as a subprocess within the base_detector’s ‘run_inference’ method.

HRNetw48, or High-Resolution Network, is a deep neural network designed for human pose estimation that maintains high-resolution representations through the whole process. It is particularly effective for tasks requiring precise localization of body joints.

Components: body_joints, hand_joints, face_landmarks

Initializes the MMPose class with configuration settings and IO handling capabilities.

This constructor takes care of all inference preparation steps, including setting up the input and output folders, configuring the keypoint mapping, and preparing the calibration parameters for 3D pose estimation.

Parameters:
  • config (dict) – A dictionary containing the configuration settings for the method detector.

  • io (class) – An instance of the IO class for input-output operations.

  • data (class) – An instance of the Data class for accessing data.

get_per_component_keypoint_mapping(keypoints_indices)[source]

Extracts and returns the indices and descriptions of keypoints for each component.

Parameters:

keypoints_indices (dict) – A dictionary containing the indices of keypoints for each component. The keys of the dictionary are the component names (‘body_joints’, ‘hand_joints’, ‘face_landmarks’), and the values are dictionaries containing the indices of keypoints for each keypoint.

Returns:

A tuple containing two dictionaries.
  • The first dictionary contains the indices of keypoints for each

    component.

  • The second dictionary contains the descriptions of keypoints for

    each component.

Return type:

tuple

class nicetoolbox.detectors.method_detectors.body_joints.mmpose_framework.MMPose(config, io, data)[source]

The MMPose class is a method detector for pose estimation using the MMPose framework.

This class is designed to perform pose estimation on video data using the MMPose framework, which is an open-source toolbox for comprehensive pose estimation. It provides the necessary preparations and post-inference utilities to integrate the MMPose framework into our pipeline.

The MMPose class supports 2D pose estimation and can handle data from multiple cameras. It also has the capability to perform 3D pose estimation by triangulating the 2D pose data from different camera views. Additionally, it offers the option to filter the 3D pose data to reduce noise and smooth out the results.

Available components: body_joints, hand_joints, face_landmarks

camera_names

Names of the cameras used to capture the video data.

Type:

list of str

video_start

The frame number from which to start pose estimation.

Type:

int

filtered

Indicates whether the 3D pose data should be filtered.

Type:

bool

filter_window_length

Specifies the length of the window used for filtering the 3D data. filter_polyorder (int): Defines the order of the polynomial used in the filtering process.

Type:

int

data_folder

The file path to the folder containing the input data.

Type:

str

out_folder

The file path to the folder where the pose estimation results will be saved.

Type:

str

prediction_folders (dict of str

str): A dictionary mapping camera names to their respective folders for storing the pose estimation predictions.

image_folders (dict of str

str): A dictionary mapping camera names to their respective folders containing the input images or frames for pose estimation.

result_folders (dict of str

str): A dictionary mapping camera names to their respective folders for storing the final pose estimation results.

calibration

Contains the calibration parameters for the cameras used in pose estimation.

Type:

dict

Initializes the MMPose class with configuration settings and IO handling capabilities.

This constructor takes care of all inference preparation steps, including setting up the input and output folders, configuring the keypoint mapping, and preparing the calibration parameters for 3D pose estimation.

Parameters:
  • config (dict) – A dictionary containing the configuration settings for the method detector.

  • io (class) – An instance of the IO class for input-output operations.

  • data (class) – An instance of the Data class for accessing data.

get_image_folders(make_dirs=False)[source]

Generate and return a dictionary of image folders for each camera.

Parameters:

make_dirs (bool) – If True, the function will create the directories if they do not exist.

Returns:

A dictionary where the keys are the camera names and the values are

the corresponding image folder paths.

Return type:

dict

abstract get_per_component_keypoint_mapping(keypoints_indices)[source]

This method extracts the keypoint indices and descriptions for each pose estimation component.

It has to be implemented by the derived classes associated to the available pose estimation algorithms. (See HRNetw48 and Vitpose classes below)

Available algorithms are: hrnetw48, vitpose

Parameters:

keypoints_indices (_type_) – _description_

get_prediction_folders(make_dirs=False)[source]

Generate and return a dictionary of prediction folders for each camera.

Parameters:

make_dirs (bool) – If True, the function will create the directories if they do not exist.

Returns:

A dictionary where the keys are the camera names and the values are

the corresponding prediction folder paths.

Return type:

dict

post_inference()[source]

Post-inference processing for pose estimation components such as body_joints, hand_joints, and face_landmarks.

This method takes the raw 2D pose estimation results and applies a series of processing steps. They include optional filtering to smooth the results, interpolation to fill in missing values, undistortion using camera calibration parameters, and 3D triangulation from multiple camera views. The final processed results are saved for further analysis and visualization for each of the components.

Steps: 1. Filtering: Applies a smoothing filter to the 2D pose estimation results

if filtering is enabled. This step reduces noise and improves the consistency of the pose data over time.

  1. Interpolation: Fills in missing values in the 2D pose estimation results.

    This is crucial for maintaining the integrity of the pose data, especially in cases where occlusion or poor lighting conditions may lead to incomplete detections.

  2. Undistortion: Corrects the 2D pose estimation results for lens distortion

    using the camera’s calibration parameters.

  3. 3D Triangulation: Uses the undistorted 2D pose estimation results from

    at least two camera views to reconstruct the 3D positions of the pose keypoints.

  4. Saving Results: The processed 3D pose data is saved to a .npz file with
    the following structure:
    • ‘2d’: A numpy array containing the 2D pose estimation results.

    • ‘2d_filtered’: A numpy array containing the filtered 2D pose

      estimation results.

    • ‘2d_interpolated’: A numpy array containing the interpolated 2D pose

      estimation results.

    • ‘bbox_2d’: A numpy array containing the 2D bounding box coordinates.

    • ‘3d’: A numpy array containing the 3D pose estimation results.

    • ‘data_description’: A dictionary containing the data description for

      the above output numpy arrays. See the documentation of the output for more details.

Returns:

None. The processed results are saved to the output folder (See step 5).

visualization(data)[source]

Generates a visualization video for each camera from the processed image frames.

This method takes the processed image frames for each camera and compiles them into a video file. It uses the frames_to_video() function from utils.video.py. The success of the video creation is tracked, and the method logs the outcome of the visualization process for each camera. :param data: An instance of a class that stores all data related

information, including the frame rate (fps) for the video and the starting frame number (video_start).

Note

  • The method assumes that the processed image frames are named in a

specific format (%09d.jpg), where each frame’s name is a zero-padded five-digit number representing its sequence in the video.

class nicetoolbox.detectors.method_detectors.body_joints.mmpose_framework.VitPose(config, io, data)[source]

VitPose is a subclass of MMPose specialized for pose estimation using the Vision Transformer (ViT) model.

The VitPose class is designed to utilize the ViT model within the MMPose framework for pose estimation tasks, focusing on body joints. It provides the necessary component keypoint indices and descriptions for the body joints component. The base_detector class starts the algorithm as a subprocess within the base_detector’s ‘run_inference’ method.

VitPose leverages the Vision Transformer architecture, a model that applies the transformer mechanism to image processing tasks, including pose estimation.

Component: body_joints

Initializes the MMPose class with configuration settings and IO handling capabilities.

This constructor takes care of all inference preparation steps, including setting up the input and output folders, configuring the keypoint mapping, and preparing the calibration parameters for 3D pose estimation.

Parameters:
  • config (dict) – A dictionary containing the configuration settings for the method detector.

  • io (class) – An instance of the IO class for input-output operations.

  • data (class) – An instance of the Data class for accessing data.

get_per_component_keypoint_mapping(keypoints_indices)[source]

Extracts and returns the indices and descriptions of keypoints for each component.

Parameters:

keypoints_indices (dict) – A dictionary containing the indices of keypoints for each component. The keys of the dictionary are the component names (‘body_joints’, ‘hand_joints’, ‘face_landmarks’), and the values are dictionaries containing the indices of keypoints for each keypoint. Note: This algorithm only supports the ‘body_joints’ component.

Returns:

A tuple containing two dictionaries.
  • The first dictionary contains the indices of keypoints for each

    component.

  • The second dictionary contains the descriptions of keypoints for each

    component.

Return type:

tuple

nicetoolbox.detectors.method_detectors.body_joints.mmpose_framework.extract_key_per_value(input_dict)[source]

Extracts keys from a dictionary based on the type of their values.

If all values in the dictionary are integers, it returns a list of keys. If any value is a list, it appends an index to the key to create a unique key.

Parameters:

input_dict (dict) – The input dictionary to extract keys from.

Returns:

A list of keys extracted from the input dictionary.

Return type:

return_keys (list)

Raises:
  • NotImplementedError – If a value in the dictionary is neither an integer nor a

  • list.