Welcome to NICE Toolbox’s documentation!

Non-Verbal Interpersonal Communication Exploration Toolbox

    Project page     Documentation     Changelog     mailto: nicetoolbox@tue.mpg.de


🚀 We are releasing version 0.2.2, which includes numerous performance improvements and bug fixes. Please check the changelog for more information.

NICE Toolbox is an easy-to-use framework for exploring nonverbal human communication. It aims to enable the investigation of observable signs that reflect the mental state and behaviors of the individual. Additionally, these visual nonverbal cues reveal the interpersonal dynamics between people in face-to-face conversations.

NICE Toolbox incorporates a growing set of Computer Vision algorithms to track and identify important visual components of nonverbal communication. Existing deep-learning and rule-based algorithms are combined into a single, easy-to-use software toolbox. Based on single- or multi-camera video data, the initial release encompasses whole-body pose estimation and gaze tracking for each individual, as well as movement dynamics calculation (kinematics), gaze interaction monitoring (mutual-gaze), the measurement of physical body distance between dyads, and emotion detection. This first set of components and algorithms is going to be extended in future releases. For more details, please see the components overview page in the wiki.

The toolbox also includes a visualizer module, which allows users to visualize and investigate the algorithm’s outputs.

Installation & getting started

For instructions on installing the toolbox on a Linux or Windows machine, please see the installation instructions page. For a quick start into the toolbox, we provide an example dataset and documentation to set it up on the getting started page. Further tutorials and documentation can be found on the tutorials and wiki pages. You can also access this documentation offline by downloading it as a PDF. Just use the ReadTheDocs pop-up menu located in the bottom right corner of the screen.

Code structure

toolbox_design.png

Future releases

In future releases, we plan to extend the NICE Toolbox to include detectors for facial expressions, head movements, eye closure, active speaking, emotional valence and arousal, and micro-action recognition.

Further, we will move beyond mere visual inspection and integrate a versatile evaluation framework. Based on our experience in computer vision, we are aware that no single algorithm can perform flawlessly across all capture settings. To support you to choose the best algorithms for your settings, we are developing an evaluation workflow that better elucidates the limitations of the algorithms, that allows for systematic comparisons of the algorithms, and that assess their accuracy within a given setting. Our goal is to provide comprehensive and objective evaluations of the algorithms, ultimately creating a practically useful toolbox for researchers analyzing human interaction and communication.

If you are interested in collaborating with us or contributing to the project, please reach out to us at nicetoolbox@tue.mpg.de.

Acknowledgments

The NICE Toolbox is using the following existing tools, methods, and frameworks: MMPose, HigherHRNet, ViTPose, DarkPose, ETH-XGaze, SPIGA, Py-FEAT, and rerun.io.

Authors

Carolin Schmitt, Gökce Ergün, Timo Lübbing, Ashutosh Jha, Senya Polikovsky, Aleksandr Evgrashin

All authors are with the Optics and Sensing Laboratory at Max-Planck Insitute for Intelligent Systems.

We thank the MPI-IS Software Workshop for their thoughtful feedback and support during the project refactoring.

License

NICE Toolbox © 2025 by Carolin Schmitt, Gökce Ergün, Timo Lübbing, Ashutosh Jha, Senya Polikovsky, Aleksandr Evgrashin is licensed under CC BY-NC-SA 4.0, see LICENSE.md.

Some components of the NICE Toolbox further use algorithms that are being distributed under other licenses listed in LICENSES_ALGORITHMS.md.

Changelog

0.2.2

  • Refactoring of data preprocessing and inference for all detectors.

  • Major optimization and bug-fixing of py-feat inference.

  • Refactoring, optimization, and bug-fixing of multiview-ethgaze.

  • Refactoring of config placeholders resolution, making it faster and more stable.

  • New config validation system. It will detect missing required fields or wrong field types across all configs.

  • Fixes for subject tracking consistency in multiple detectors.

  • In detectors_run_file.toml you can set video_length = -1 to process all frames inside a video.

Breaking changes:

  • The frame index leading zeroes format was extended from 05d to 09d to support longer videos. This results in new filenames.

  • CSV exported files are now saved inside individual video folders, not inside the root output folder. This can be customized in config.

  • All runtime placeholders now start with cur_<placeholder_name>. For example, the <session_ID> placeholder was renamed to <cur_session_ID>.

  • Cyclic placeholder dependencies are deprecated. For example, git_hash = "<git_hash>" will now raise an error.

  • Placeholder shadowing is deprecated. Use unique placeholder names at each level of the config file.

  • NICE Toolbox now uses submodule forks of mmpose and SPIGA. Library versions remain the same, so there should be no changes in results.

  • Multiview-ETH-XGaze now supports multiview only inside NICE Toolbox. All logic for multi-camera fusion was moved to NICE.

  • eth_xgaze now exports raw 3d and 3d_filtered for individual cameras and xgaze_gaze_fused and xgaze_gaze_fused_filtered fused from all cameras.

  • eth_xgaze now exports landmarks_2d with confidence scores.

  • detectors_run_file.toml config now requires log_level and error_level fields to be set.

0.2.1

  • Evaluation module, Docker support, additional detector output, and many other improvements.

0.2.0

  • Code refactoring, easier installation, and new detectors for emotion individuals and head orientation.

0.1.0

  • Initial release.