Voice Activity Detection
pyannote.audio
PyTorch
pyannote
pyannote-audio-model
audio
voice
speech
speaker
speaker-segmentation
overlapped-speech-detection
resegmentation
Instructions to use zermok/segmentation with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- pyannote.audio
How to use zermok/segmentation with pyannote.audio:
from pyannote.audio import Model, Inference model = Model.from_pretrained("zermok/segmentation") inference = Inference(model) # inference on the whole file inference("file.wav") # inference on an excerpt from pyannote.core import Segment excerpt = Segment(start=2.0, end=5.0) inference.crop("file.wav", excerpt) - Notebooks
- Google Colab
- Kaggle
Hervé Bredin commited on
Commit ·
d24d0d1
1
Parent(s): 0915f86
fix: update RR to match with latest code/paper version
Browse files- README.md +6 -6
- reproducible_research/expected_outputs/osd/AMI.development.rttm +0 -0
- reproducible_research/expected_outputs/osd/AMI.test.rttm +0 -0
- reproducible_research/expected_outputs/osd/DIHARD.development.rttm +0 -0
- reproducible_research/expected_outputs/osd/DIHARD.test.rttm +0 -0
- reproducible_research/expected_outputs/osd/VoxConverse.development.rttm +0 -0
- reproducible_research/expected_outputs/osd/VoxConverse.test.rttm +0 -0
- reproducible_research/expected_outputs/rsg/AMI.development.rttm +0 -0
- reproducible_research/expected_outputs/rsg/AMI.test.rttm +0 -0
- reproducible_research/expected_outputs/rsg/DIHARD.development.rttm +0 -0
- reproducible_research/expected_outputs/rsg/DIHARD.test.rttm +0 -0
- reproducible_research/expected_outputs/rsg/VoxConverse.development.rttm +0 -0
- reproducible_research/expected_outputs/vad/AMI.development.rttm +0 -0
- reproducible_research/expected_outputs/vad/AMI.test.rttm +0 -0
- reproducible_research/expected_outputs/vad/DIHARD.development.rttm +0 -0
- reproducible_research/expected_outputs/vad/DIHARD.test.rttm +0 -0
- reproducible_research/expected_outputs/vad/VoxConverse.development.rttm +0 -0
- reproducible_research/expected_outputs/vad/VoxConverse.test.rttm +0 -0
- reproducible_research/expected_outputs/vbx/AMI.rttm +0 -0
- reproducible_research/expected_outputs/vbx/DIHARD.rttm +0 -0
- reproducible_research/expected_outputs/vbx/VoxConverse.rttm +0 -0
README.md
CHANGED
|
@@ -90,15 +90,15 @@ In order to reproduce the results of the paper ["End-to-end speaker segmentation
|
|
| 90 |
|
| 91 |
Voice activity detection | `onset` | `offset` | `min_duration_on` | `min_duration_off`
|
| 92 |
----------------|---------|----------|-------------------|-------------------
|
| 93 |
-
AMI Mix-Headset | 0.
|
| 94 |
-
DIHARD3 | 0.
|
| 95 |
-
VoxConverse | 0.
|
| 96 |
|
| 97 |
Overlapped speech detection | `onset` | `offset` | `min_duration_on` | `min_duration_off`
|
| 98 |
----------------|---------|----------|-------------------|-------------------
|
| 99 |
-
AMI Mix-Headset | 0.
|
| 100 |
-
DIHARD3 | 0.
|
| 101 |
-
VoxConverse | 0.
|
| 102 |
|
| 103 |
Resegmentation of VBx | `onset` | `offset` | `min_duration_on` | `min_duration_off`
|
| 104 |
----------------|---------|----------|-------------------|-------------------
|
|
|
|
| 90 |
|
| 91 |
Voice activity detection | `onset` | `offset` | `min_duration_on` | `min_duration_off`
|
| 92 |
----------------|---------|----------|-------------------|-------------------
|
| 93 |
+
AMI Mix-Headset | 0.684 | 0.577 | 0.181 | 0.037
|
| 94 |
+
DIHARD3 | 0.767 | 0.377 | 0.136 | 0.067
|
| 95 |
+
VoxConverse | 0.767 | 0.713 | 0.182 | 0.501
|
| 96 |
|
| 97 |
Overlapped speech detection | `onset` | `offset` | `min_duration_on` | `min_duration_off`
|
| 98 |
----------------|---------|----------|-------------------|-------------------
|
| 99 |
+
AMI Mix-Headset | 0.448 | 0.362 | 0.116 | 0.187
|
| 100 |
+
DIHARD3 | 0.430 | 0.320 | 0.091 | 0.144
|
| 101 |
+
VoxConverse | 0.587 | 0.426 | 0.337 | 0.112
|
| 102 |
|
| 103 |
Resegmentation of VBx | `onset` | `offset` | `min_duration_on` | `min_duration_off`
|
| 104 |
----------------|---------|----------|-------------------|-------------------
|
reproducible_research/expected_outputs/osd/AMI.development.rttm
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
reproducible_research/expected_outputs/osd/AMI.test.rttm
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
reproducible_research/expected_outputs/osd/DIHARD.development.rttm
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
reproducible_research/expected_outputs/osd/DIHARD.test.rttm
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
reproducible_research/expected_outputs/osd/VoxConverse.development.rttm
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
reproducible_research/expected_outputs/osd/VoxConverse.test.rttm
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
reproducible_research/expected_outputs/rsg/AMI.development.rttm
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
reproducible_research/expected_outputs/rsg/AMI.test.rttm
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
reproducible_research/expected_outputs/rsg/DIHARD.development.rttm
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
reproducible_research/expected_outputs/rsg/DIHARD.test.rttm
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
reproducible_research/expected_outputs/rsg/VoxConverse.development.rttm
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
reproducible_research/expected_outputs/vad/AMI.development.rttm
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
reproducible_research/expected_outputs/vad/AMI.test.rttm
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
reproducible_research/expected_outputs/vad/DIHARD.development.rttm
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
reproducible_research/expected_outputs/vad/DIHARD.test.rttm
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
reproducible_research/expected_outputs/vad/VoxConverse.development.rttm
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
reproducible_research/expected_outputs/vad/VoxConverse.test.rttm
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
reproducible_research/expected_outputs/vbx/AMI.rttm
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
reproducible_research/expected_outputs/vbx/DIHARD.rttm
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
reproducible_research/expected_outputs/vbx/VoxConverse.rttm
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|