The SELMA diarization prototype transcribes speech and diarizes/segments it according to speakers. It also can identify public / known speakers and adds a gender label. User feedback (mostly about wrongly labeled person names / proper names) can be used to retrain the models to improve detection.
Image shows the diarization UI. At the top is the main menu, including the field to upload a media item. The media can be transcribed and diarized. Feedback can be submitted and used to retrain a model. It can be used to update this instance or all instances. Segmentation and speaker labeling can be downloaded. In the middle section, a thumbnail is shown next to the segmented audio file. At the bottom, the speakers of a media item are listed along with the transcribed text and timestamps.
SELMA contribution
- Development of Prototype UI
- Integration of models and components, including user feedback
- User Evaluation for improved user experience
More
- SELMA diarization prototype (not publicly available due to data protection)
Bring me to DW Benchmarking