xRead - Incorporating Artificial Intelligence into Clinical Practice (March 2026)

First page Table of contents Previous page 25 Next page Last page

Ng et al. BMC Medical Informatics and Decision Making

(2025) 25:236

Page 13 of 24

Key Findings Novel Features

The study highlights low recall rates for clinical concepts in primary care settings. This could be improved by incorporat ing better domain-specific vocabulary

and adapting ASR engines specifically for medical language.

Recognizing the differences in error rates between doctors and patients, the study

recommends fine-tuning ASR systems for diverse speaker roles and ensuring better handling of complex dialogues between multiple speakers.

NR The achievable perfomance of con temporary ASR engines, when applied to conversational clinical speech as

measured by WER and clinical concept extraction, is disappointing with WER of approximately 50% and concept

extraction rates of approximately 60%. Limited number of use cases where

this level of performance is adequate.

AI Transcription

Proficiency (paper

specific outcomes)

> 0.40

> 0.60

> 0.40

> 0.60

> 0.20

Recall

WER

49%

44%

38%

41%

35%

58%

65%

Metric (F1 score, F1 > 0.60

Precision, Recall, WER) > 0.60

> 0.60

> 0.40

Precision > 0.80

> 0.80

> 0.60

> 0.80

> 0.40

Gold standard - Bing Speech API (BING),

- Google Cloud Speech API (Google),

- IBM Speech to Text (IBM),

- Azure Media

Indexer (MAVIS),

- Azure Media In

dexer 2 - Preview (MAVIS v2), - Nuance.

SpeechAnywhere (Nuance),

- Amazon Tran

scribe Preview (Transcribe), - Mozilla

DeepSpeech

(DeepSpeech)

Comparator Type Subcategories Performance

Standard

Profession

ally transcribed

and annotated

recordings with speaker and

time index.

Extracted

clinical concepts

using the same commercially

available NLP engine and

open source NLP.

Table 2 (continued)

Study Reference Kodish

Wachs et

al., 2018 [17]

Made with FlippingBook - professional solution for displaying marketing and sales documents online