The SELMA diarization prototype transcribes speech and diarizes/segments it according to speakers. It also can identify public / known speakers and adds a gender label. User feedback (mostly about wrongly labeled person names / proper names) can be used to retrain the models to improve detection.

Screenshot of the Diarization Prototype

Image shows the diarization UI. At the top is the main menu, including the field to upload a media item. The media can be transcribed and diarized. Feedback can be submitted and used to retrain a model. It can be used to update this instance or all instances. Segmentation and speaker labeling can be downloaded. In the middle section, a thumbnail is shown next to the segmented audio file. At the bottom, the speakers of a media item are listed along with the transcribed text and timestamps.

SELMA contribution

  • Development of Prototype UI
  • Integration of models and components, including user feedback 
  • User Evaluation for improved user experience


  • SELMA diarization prototype (not publicly available due to data protection)

Bring me to DW Benchmarking