nicetoolbox.detectors.method_detectors.body_joints.mmpose_framework.HRNetw48¶
- class nicetoolbox.detectors.method_detectors.body_joints.mmpose_framework.HRNetw48(config, io, data)[source]¶
Bases:
MMPoseHRNetw48 is a subclass of MMPose specialized for pose estimation using the HRNetw48 model.
The HRNetw48 class is designed to utilize the HRNetw48 model within the MMPose framework for pose estimation tasks. It provides the necessary component keypoint indices and descriptions for the body joints component. The base_detector class starts the algorithm as a subprocess within the base_detector’s ‘run_inference’ method.
HRNetw48, or High-Resolution Network, is a deep neural network designed for human pose estimation that maintains high-resolution representations through the whole process. It is particularly effective for tasks requiring precise localization of body joints.
Components: body_joints, hand_joints, face_landmarks
Initializes the MMPose class with configuration settings and IO handling capabilities.
This constructor takes care of all inference preparation steps, including setting up the input and output folders, configuring the keypoint mapping, and preparing the calibration parameters for 3D pose estimation.
- Parameters:
config (dict) – A dictionary containing the configuration settings for the method detector.
io (class) – An instance of the IO class for input-output operations.
data (class) – An instance of the Data class for accessing data.
Methods
Generate and return a dictionary of image folders for each camera.
Extracts and returns the indices and descriptions of keypoints for each component.
Generate and return a dictionary of prediction folders for each camera.
Post-inference processing for pose estimation components such as body_joints, hand_joints, and face_landmarks.
Runs the inference of the method detector in a separate terminal/cmd window using the specified virtual environment or conda environment.
Generates a visualization video for each camera from the processed image frames.
Attributes
algorithmcomponents- get_image_folders(make_dirs=False)¶
Generate and return a dictionary of image folders for each camera.
- Parameters:
make_dirs (bool) – If True, the function will create the directories if they do not exist.
- Returns:
- A dictionary where the keys are the camera names and the values are
the corresponding image folder paths.
- Return type:
dict
- get_per_component_keypoint_mapping(keypoints_indices)[source]¶
Extracts and returns the indices and descriptions of keypoints for each component.
- Parameters:
keypoints_indices (dict) – A dictionary containing the indices of keypoints for each component. The keys of the dictionary are the component names (‘body_joints’, ‘hand_joints’, ‘face_landmarks’), and the values are dictionaries containing the indices of keypoints for each keypoint.
- Returns:
- A tuple containing two dictionaries.
- The first dictionary contains the indices of keypoints for each
component.
- The second dictionary contains the descriptions of keypoints for
each component.
- Return type:
tuple
- get_prediction_folders(make_dirs=False)¶
Generate and return a dictionary of prediction folders for each camera.
- Parameters:
make_dirs (bool) – If True, the function will create the directories if they do not exist.
- Returns:
- A dictionary where the keys are the camera names and the values are
the corresponding prediction folder paths.
- Return type:
dict
- post_inference()¶
Post-inference processing for pose estimation components such as body_joints, hand_joints, and face_landmarks.
This method takes the raw 2D pose estimation results and applies a series of processing steps. They include optional filtering to smooth the results, interpolation to fill in missing values, undistortion using camera calibration parameters, and 3D triangulation from multiple camera views. The final processed results are saved for further analysis and visualization for each of the components.
Steps: 1. Filtering: Applies a smoothing filter to the 2D pose estimation results
if filtering is enabled. This step reduces noise and improves the consistency of the pose data over time.
- Interpolation: Fills in missing values in the 2D pose estimation results.
This is crucial for maintaining the integrity of the pose data, especially in cases where occlusion or poor lighting conditions may lead to incomplete detections.
- Undistortion: Corrects the 2D pose estimation results for lens distortion
using the camera’s calibration parameters.
- 3D Triangulation: Uses the undistorted 2D pose estimation results from
at least two camera views to reconstruct the 3D positions of the pose keypoints.
- Saving Results: The processed 3D pose data is saved to a .npz file with
- the following structure:
‘2d’: A numpy array containing the 2D pose estimation results.
- ‘2d_filtered’: A numpy array containing the filtered 2D pose
estimation results.
- ‘2d_interpolated’: A numpy array containing the interpolated 2D pose
estimation results.
‘bbox_2d’: A numpy array containing the 2D bounding box coordinates.
‘3d’: A numpy array containing the 3D pose estimation results.
- ‘data_description’: A dictionary containing the data description for
the above output numpy arrays. See the documentation of the output for more details.
- Returns:
None. The processed results are saved to the output folder (See step 5).
- run_inference() None¶
Runs the inference of the method detector in a separate terminal/cmd window using the specified virtual environment or conda environment. Captures the output and logs the success or failure of the inference.
- visualization(data)¶
Generates a visualization video for each camera from the processed image frames.
This method takes the processed image frames for each camera and compiles them into a video file. It uses the frames_to_video() function from utils.video.py. The success of the video creation is tracked, and the method logs the outcome of the visualization process for each camera. :param data: An instance of a class that stores all data related
information, including the frame rate (fps) for the video and the starting frame number (video_start).
Note
The method assumes that the processed image frames are named in a
specific format (%09d.jpg), where each frame’s name is a zero-padded five-digit number representing its sequence in the video.