nicetoolbox.detectors.data_handlers.audio_handler¶
Audio data extraction and organization handler.
Extracts audio from video files or locates standalone audio files, organizes them into the nicetoolbox_input/audio/ folder.
DESIGN DECISIONS: 1. Audio is ALWAYS extracted in FULL (no time-range cutting).
The time range is passed in the recipe for inference scripts to use with librosa’s offset/duration parameters.
No preprocessing (resampling, normalization) — inference scripts own that.
Track configuration from dataset_properties.toml drives extraction.
Classes
Handles audio data extraction and organization. |
- class nicetoolbox.detectors.data_handlers.audio_handler.AudioDataHandler(io: SequenceIO, sequence_context: SequenceRuntimeConfig, audio_start_ms: float, audio_length_ms: float, tracks_config: dict[str, nicetoolbox.configs.schemas.dataset_properties.AudioTrackConfig])[source]¶
Handles audio data extraction and organization.
Uses the track configuration from dataset_properties to determine which audio streams to extract and from where.
- property modality_name: str¶
Return the name of this modality (e.g., ‘video’, ‘audio’).