PriMock57: A Dataset Of Primary Care Mock Consultations
About
Recent advances in Automatic Speech Recognition (ASR) have made it possible to reliably produce automatic transcripts of clinician-patient conversations. However, access to clinical datasets is heavily restricted due to patient privacy, thus slowing down normal research practices. We detail the development of a public access, high quality dataset comprising of57 mocked primary care consultations, including audio recordings, their manual utterance-level transcriptions, and the associated consultation notes. Our work illustrates how the dataset can be used as a benchmark for conversational medical ASR as well as consultation note generation from transcripts.
Alex Papadopoulos Korfiatis, Francesco Moramarco, Radmila Sarac, Aleksandar Savkov• 2022
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Transcript Alignment | Common Voice English 8 (test) | Character GLE65.8 | 16 | |
| Transcript Alignment | PriMock57 (PM57) 1 (test) | Character GLE76.7 | 16 | |
| Transcript Alignment | TED-LIUM v3 (test) | Character GLE78.1 | 16 | |
| Speech Alignment | Common Voice Portuguese | Character GLE59.2 | 3 | |
| Speech Alignment | Common Voice Spanish | Character GLE (%)60.9 | 3 | |
| Speech Alignment | Common Voice Turkish | Character GLE40.4 | 3 | |
| Speech Alignment | Common Voice German | Character GLE (%)47 | 3 | |
| Speech Alignment | Common Voice Polish | Character GLE54 | 3 | |
| Speech Alignment | Common Voice Indonesian | Character GLE56.5 | 3 | |
| Speech Alignment | Common Voice Swahili | Character GLE45.3 | 3 |
Showing 10 of 11 rows