xRead - Incorporating Artificial Intelligence into Clinical Practice (March 2026)
Ng et al. BMC Medical Informatics and Decision Making
(2025) 25:236
Page 13 of 24
Key Findings Novel Features
The study highlights low recall rates for clinical concepts in primary care settings. This could be improved by incorporat ing better domain-specific vocabulary
and adapting ASR engines specifically for medical language.
Recognizing the differences in error rates between doctors and patients, the study
recommends fine-tuning ASR systems for diverse speaker roles and ensuring better handling of complex dialogues between multiple speakers.
NR The achievable perfomance of con temporary ASR engines, when applied to conversational clinical speech as
measured by WER and clinical concept extraction, is disappointing with WER of approximately 50% and concept
extraction rates of approximately 60%. Limited number of use cases where
this level of performance is adequate.
AI Transcription
Proficiency (paper
specific outcomes)
> 0.40
> 0.60
> 0.60
> 0.60
> 0.60
> 0.40
> 0.60
> 0.20
Recall
WER
49%
44%
38%
41%
35%
58%
NR
65%
Metric (F1 score, F1 > 0.60
Precision, Recall, WER) > 0.60
> 0.60
> 0.60
> 0.60
> 0.60
> 0.60
> 0.40
Precision > 0.80
> 0.80
> 0.60
> 0.60
> 0.80
> 0.80
> 0.80
> 0.40
Gold standard - Bing Speech API (BING),
- Google Cloud Speech API (Google),
- IBM Speech to Text (IBM),
- Azure Media
Indexer (MAVIS),
- Azure Media In
dexer 2 - Preview (MAVIS v2), - Nuance.
SpeechAnywhere (Nuance),
- Amazon Tran
scribe Preview (Transcribe), - Mozilla
DeepSpeech
(DeepSpeech)
Comparator Type Subcategories Performance
Standard
Profession
ally transcribed
and annotated
recordings with speaker and
time index.
Extracted
clinical concepts
using the same commercially
available NLP engine and
open source NLP.
Table 2 (continued)
Study Reference Kodish
Wachs et
al., 2018 [17]
Made with FlippingBook - professional solution for displaying marketing and sales documents online