nicetoolbox.detectors.method_detectors.whisperx.whisperx_detector

WhisperX method detector class (mock/debug implementation).

Classes

WhisperX

Initialize base method detector with references.

class nicetoolbox.detectors.method_detectors.whisperx.whisperx_detector.WhisperX(io: SequenceIO, data: SequenceData, sequence_context: SequenceRuntimeConfig, algorithm_instance: str)[source]

Initialize base method detector with references.

post_inference() None[source]

Process individual speaker aligned transcription json outputs into our final json format.

Structure: {

“track_name”: {
“total”: {

“text”: “full concatenated transcription text for the track”, “start”: start_time_of_first_segment, “end”: end_time_of_last_segment,

}, “segments”: [

{

“start”: segment_start_time, “end”: segment_end_time, “text”: “segment_transcription_text”, “avg_logprob”: segment_avg_log_probability,

], “word_segments”: [

{

“word”: word_text, “start”: word_start_time, “end”: word_end_time, “score”: word log probability score, “speaker”: speaker_label provided by pyannote

], “language”: detected_language

}

visualization(_) None[source]

Generates visualizations overlaying SRTs subtitles onto video files.

Uses the generated SRT files from the extra outputs of the audio transcription component. These SRT files are raw outputs from WhisperX of the final speaker-aligned transcription segments with speaker labels.

We create a new video from scratch based on the video frames (if available) or a black background (if no video frames are available) and overlay the SRT subtitles onto it.