Diarization

The SELMA diarization prototype transcribes speech and diarizes/segments it according to speakers. It also can identify public / known speakers and adds a gender label. User feedback (mostly about wrongly labeled person names / proper names) can be used to retrain the models to improve detection.

Image shows the diarization UI. At the top is the main menu, including the field to upload a media item. The media can be transcribed and diarized. Feedback can be submitted and used to retrain a model. It can be used to update this instance or all instances. Segmentation and speaker labeling can be downloaded. In the middle section, a thumbnail is shown next to the segmented audio file. At the bottom, the speakers of a media item are listed along with the transcribed text and timestamps.

SELMA contribution

Development of Prototype UI
Integration of models and components, including user feedback
User Evaluation for improved user experience

More

SELMA diarization prototype (not publicly available due to data protection)

Bring me to DW Benchmarking