Welcome to NICE Toolbox’s documentation!¶

Nonverbal Interpersonal Communication Exploration Toolbox¶
Project page Documentation Changelog mailto: nicetoolbox@tue.mpg.de
🚀 We are releasing a new major 0.3.0 version which includes SAM 3D Body model support, WhisperX audio transcription, updated evaluation pipeline, connectors to ELAN and napari, and many other improvements and fixes. Please check the changelog for more information.
NICE Toolbox is an easy-to-use framework for exploring nonverbal human communication. It aims to enable the investigation of observable signs that reflect the mental state and behaviors of the individual. Additionally, these visual nonverbal cues reveal the interpersonal dynamics between people in face-to-face conversations.
NICE combines existing computer vision detectors into a single, easy-to-use framework. Working from single- or multi-camera video data, it covers whole-body pose estimation, gaze tracking, movement dynamics (kinematics), gaze interaction monitoring (mutual gaze), physical proximity between dyads, emotion detection and more. For a full list, see the components overview.
The toolbox also includes a visualizer molule for interactively exploring outputs, an evaluation module that runs configurable metrics and a collection of connectors for importing/exporting data to third-party tools (i.e. for labelling in ELAN or napari-deeplabcut).
Installation & getting started¶
For instructions on installing the toolbox on a Linux or Windows machine, please see the installation instructions page. For a quick start into the toolbox, we provide an example dataset and documentation to set it up on the getting started page. Further tutorials and documentation can be found on the tutorials and wiki pages. You can also access this documentation offline by downloading it as a PDF. Just use the ReadTheDocs pop-up menu located in the bottom right corner of the screen.
Future releases¶
In future releases, we plan to extend the NICE Toolbox to include detectors for facial expressions, head movements, eye closure, active speaking, emotional valence and arousal, and micro-action recognition.
Our goal is to provide comprehensive and objective evaluations of the algorithms, ultimately creating a practically useful toolbox for researchers analyzing human interaction and communication.
If you are interested in collaborating with us or contributing to the project, please reach out to us at nicetoolbox@tue.mpg.de.
Acknowledgments¶
The NICE Toolbox is using the following existing tools, methods, and frameworks: MMPose, MotionBERT, HigherHRNet, ViTPose, DarkPose, RTMPose, SAM 3D Body, ETH-XGaze, SPIGA, WhisperX, Py-FEAT, and rerun.io.
License¶
NICE Toolbox © 2026 Max-Planck-Gesellschaft zur Förderung der Wissenschaften e.V is licensed under CC BY-NC-SA 4.0, see LICENSE.md.
Some components of the NICE Toolbox further use algorithms that are being distributed under other licenses listed in LICENSES_ALGORITHMS.md.
Changelog¶
0.3.0¶
SAM 3D Body (
sam_3d_body) - 3D whole-body pose estimation. Supports single- and multi-view setups.WhisperX (
whisperx) - audio transcription and speaker diarization.MotionBERT (
motionbert) - 3D body pose lifting from precomputed 2D body joint detections.New MMPose algorithms:
vitpose_huge,rtmpose_l_aic,rtmpose_l_wholebody, andrtmpose_m_mpii.New ELAN connector: export outputs to ELAN annotation format for manual labeling workflows.
New napari-deeplabcut connector: export body joint detections to DeepLabCut format.
Updated evaluation pipeline: new ground-truth-based metrics, improved configuration schema, and flexible group-by and aggregation options.
New asset download manager: model weights are now downloaded automatically during setup or first run.
New project config: a central config file per project that holds paths to your dataset and detector configs, decoupling project settings from the NICE Toolbox installation folder.
Algorithms Instances support, allows to create multiple configurations of the same algorithms with different parameters.
Sequences time ranges
video_startandvideo_stopnow accept timestamps (e.g."00:01:30") in addition to frame numbers.
Breaking changes:
detectors_run_file.tomlhas changed.component_algorithm_mappingand per sequencecomponentslists are deprecated. Usealgorithmslist for all desired algorithms instances.evaluation_config.tomlwas redesigned, please update it based on the provided example.Separate evaluation summaries are currently deprecated and now a part of metrics.
EvaluationWrapperfor exporting evaluation results to pandas is deprecated.frameworksindetectors_config.tomlare deprecated. There are more general usetemplatesnow. Please update your config.
0.2.2¶
Refactoring of data preprocessing and inference for all detectors.
Major optimization and bug-fixing of py-feat inference.
Refactoring, optimization, and bug-fixing of multiview-ethgaze.
Refactoring of config placeholders resolution, making it faster and more stable.
New config validation system. It will detect missing required fields or wrong field types across all configs.
Fixes for subject tracking consistency in multiple detectors.
In
detectors_run_file.tomlyou can setvideo_length = -1to process all frames inside a video.
Breaking changes:
The frame index leading zeroes format was extended from
05dto09dto support longer videos. This results in new filenames.CSV exported files are now saved inside individual video folders, not inside the root output folder. This can be customized in config.
All runtime placeholders now start with
cur_<placeholder_name>. For example, the<session_ID>placeholder was renamed to<cur_session_ID>.Cyclic placeholder dependencies are deprecated. For example,
git_hash = "<git_hash>"will now raise an error.Placeholder shadowing is deprecated. Use unique placeholder names at each level of the config file.
NICE Toolbox now uses submodule forks of mmpose and SPIGA. Library versions remain the same, so there should be no changes in results.
Multiview-ETH-XGaze now supports multiview only inside NICE Toolbox. All logic for multi-camera fusion was moved to NICE.
eth_xgazenow exports raw3dand3d_filteredfor individual cameras andxgaze_gaze_fusedandxgaze_gaze_fused_filteredfused from all cameras.eth_xgazenow exportslandmarks_2dwith confidence scores.detectors_run_file.tomlconfig now requireslog_levelanderror_levelfields to be set.
0.2.1¶
Evaluation module, Docker support, additional detector output, and many other improvements.
0.2.0¶
Code refactoring, easier installation, and new detectors for emotion individuals and head orientation.
0.1.0¶
Initial release.