nicetoolbox.detectors.data_handlers.audio_handler

Audio data extraction and organization handler.

Extracts audio from video files or locates standalone audio files, organizes them into the nicetoolbox_input/audio/ folder.

DESIGN DECISIONS: 1. Audio is ALWAYS extracted in FULL (no time-range cutting).

The time range is passed in the recipe for inference scripts to use with librosa’s offset/duration parameters.

  1. No preprocessing (resampling, normalization) — inference scripts own that.

  2. Track configuration from dataset_properties.toml drives extraction.

Classes

AudioDataHandler

Handles audio data extraction and organization.

class nicetoolbox.detectors.data_handlers.audio_handler.AudioDataHandler(io: SequenceIO, sequence_context: SequenceRuntimeConfig, audio_start_ms: float, audio_length_ms: float, tracks_config: dict[str, nicetoolbox.configs.schemas.dataset_properties.AudioTrackConfig])[source]

Handles audio data extraction and organization.

Uses the track configuration from dataset_properties to determine which audio streams to extract and from where.

get_recipe() AudioInputRecipe[source]

Build audio input recipe with full file paths + time range.

property modality_name: str

Return the name of this modality (e.g., ‘video’, ‘audio’).

prepare() None[source]

Prepare audio data based on track configuration.

Audio is always extracted in FULL. Already-extracted files are reused. Sets self._available = True if at least one track was prepared.