PriMock57: A Dataset Of Primary Care Mock Consultations

About

Recent advances in Automatic Speech Recognition (ASR) have made it possible to reliably produce automatic transcripts of clinician-patient conversations. However, access to clinical datasets is heavily restricted due to patient privacy, thus slowing down normal research practices. We detail the development of a public access, high quality dataset comprising of57 mocked primary care consultations, including audio recordings, their manual utterance-level transcriptions, and the associated consultation notes. Our work illustrates how the dataset can be used as a benchmark for conversational medical ASR as well as consultation note generation from transcripts.

Alex Papadopoulos Korfiatis, Francesco Moramarco, Radmila Sarac, Aleksandar Savkov• 2022

Related benchmarks

Task	Dataset	Result
Transcript Alignment	Common Voice English 8 (test)	Character GLE65.8	16
Transcript Alignment	PriMock57 (PM57) 1 (test)	Character GLE76.7	16
Transcript Alignment	TED-LIUM v3 (test)	Character GLE78.1	16
Dial-2-Note	Dial-2-Note (test)	BLEU2	9
Note-2-Dial	Note-2-Dial (test)	BLEU0.01	9
Speech Alignment	Common Voice Portuguese	Character GLE59.2	3
Speech Alignment	Common Voice Spanish	Character GLE (%)60.9	3
Speech Alignment	Common Voice Turkish	Character GLE40.4	3
Speech Alignment	Common Voice German	Character GLE (%)47	3
Speech Alignment	Common Voice Polish	Character GLE54	3

Showing 10 of 13 rows

Other info

Code

Follow for update

@wizwand_team Discord