Welcome to NICE Toolbox’s documentation!¶

Nonverbal Interpersonal Communication Exploration Toolbox¶

Project page Documentation Changelog mailto: nicetoolbox@tue.mpg.de

🚀 We are releasing a new major 0.3.0 version which includes SAM 3D Body model support, WhisperX audio transcription, updated evaluation pipeline, connectors to ELAN and napari, and many other improvements and fixes. Please check the changelog for more information.

NICE Toolbox is an easy-to-use framework for exploring nonverbal human communication. It aims to enable the investigation of observable signs that reflect the mental state and behaviors of the individual. Additionally, these visual nonverbal cues reveal the interpersonal dynamics between people in face-to-face conversations.

NICE combines existing computer vision detectors into a single, easy-to-use framework. Working from single- or multi-camera video data, it covers whole-body pose estimation, gaze tracking, movement dynamics (kinematics), gaze interaction monitoring (mutual gaze), physical proximity between dyads, emotion detection and more. For a full list, see the components overview.

The toolbox also includes a visualizer molule for interactively exploring outputs, an evaluation module that runs configurable metrics and a collection of connectors for importing/exporting data to third-party tools (i.e. for labelling in ELAN or napari-deeplabcut).

Installation & getting started¶

For instructions on installing the toolbox on a Linux or Windows machine, please see the installation instructions page. For a quick start into the toolbox, we provide an example dataset and documentation to set it up on the getting started page. Further tutorials and documentation can be found on the tutorials and wiki pages. You can also access this documentation offline by downloading it as a PDF. Just use the ReadTheDocs pop-up menu located in the bottom right corner of the screen.

Future releases¶

In future releases, we plan to extend the NICE Toolbox to include detectors for facial expressions, head movements, eye closure, active speaking, emotional valence and arousal, and micro-action recognition.

Our goal is to provide comprehensive and objective evaluations of the algorithms, ultimately creating a practically useful toolbox for researchers analyzing human interaction and communication.

If you are interested in collaborating with us or contributing to the project, please reach out to us at nicetoolbox@tue.mpg.de.

Acknowledgments¶

The NICE Toolbox is using the following existing tools, methods, and frameworks: MMPose, MotionBERT, HigherHRNet, ViTPose, DarkPose, RTMPose, SAM 3D Body, ETH-XGaze, SPIGA, WhisperX, Py-FEAT, and rerun.io.

Authors¶

Aleksandr Evgrashin, Carolin Schmitt, Timo Lübbing, Ashutosh Jha, Sophie Bauer, Gökce Ergün, Senya Polikovsky.

All authors are with the Optics and Sensing Laboratory at Max Planck Institute for Intelligent Systems.

We thank the MPI-IS Software Workshop for their thoughtful feedback and support during the project refactoring.

License¶

Some components of the NICE Toolbox further use algorithms that are being distributed under other licenses listed in LICENSES_ALGORITHMS.md.

Changelog¶

0.3.0¶

SAM 3D Body (sam_3d_body) - 3D whole-body pose estimation. Supports single- and multi-view setups.
WhisperX (whisperx) - audio transcription and speaker diarization.
MotionBERT (motionbert) - 3D body pose lifting from precomputed 2D body joint detections.
New MMPose algorithms: vitpose_huge, rtmpose_l_aic, rtmpose_l_wholebody, and rtmpose_m_mpii.
New ELAN connector: export outputs to ELAN annotation format for manual labeling workflows.
New napari-deeplabcut connector: export body joint detections to DeepLabCut format.
Updated evaluation pipeline: new ground-truth-based metrics, improved configuration schema, and flexible group-by and aggregation options.
New asset download manager: model weights are now downloaded automatically during setup or first run.
New project config: a central config file per project that holds paths to your dataset and detector configs, decoupling project settings from the NICE Toolbox installation folder.
Algorithms Instances support, allows to create multiple configurations of the same algorithms with different parameters.
Sequences time ranges video_start and video_stop now accept timestamps (e.g. "00:01:30") in addition to frame numbers.

Breaking changes:

detectors_run_file.toml has changed. component_algorithm_mapping and per sequence components lists are deprecated. Use algorithms list for all desired algorithms instances.
evaluation_config.toml was redesigned, please update it based on the provided example.
Separate evaluation summaries are currently deprecated and now a part of metrics.
EvaluationWrapper for exporting evaluation results to pandas is deprecated.
frameworks in detectors_config.toml are deprecated. There are more general use templates now. Please update your config.

0.2.2¶

Refactoring of data preprocessing and inference for all detectors.
Major optimization and bug-fixing of py-feat inference.
Refactoring, optimization, and bug-fixing of multiview-ethgaze.
Refactoring of config placeholders resolution, making it faster and more stable.
New config validation system. It will detect missing required fields or wrong field types across all configs.
Fixes for subject tracking consistency in multiple detectors.
In detectors_run_file.toml you can set video_length = -1 to process all frames inside a video.

Breaking changes:

The frame index leading zeroes format was extended from 05d to 09d to support longer videos. This results in new filenames.
CSV exported files are now saved inside individual video folders, not inside the root output folder. This can be customized in config.
All runtime placeholders now start with cur_<placeholder_name>. For example, the <session_ID> placeholder was renamed to <cur_session_ID>.
Cyclic placeholder dependencies are deprecated. For example, git_hash = "<git_hash>" will now raise an error.
Placeholder shadowing is deprecated. Use unique placeholder names at each level of the config file.
NICE Toolbox now uses submodule forks of mmpose and SPIGA. Library versions remain the same, so there should be no changes in results.
Multiview-ETH-XGaze now supports multiview only inside NICE Toolbox. All logic for multi-camera fusion was moved to NICE.
eth_xgaze now exports raw 3d and 3d_filtered for individual cameras and xgaze_gaze_fused and xgaze_gaze_fused_filtered fused from all cameras.
eth_xgaze now exports landmarks_2d with confidence scores.
detectors_run_file.toml config now requires log_level and error_level fields to be set.

0.2.1¶

Evaluation module, Docker support, additional detector output, and many other improvements.

0.2.0¶

Code refactoring, easier installation, and new detectors for emotion individuals and head orientation.

0.1.0¶

Initial release.