xRead - Incorporating Artificial Intelligence into Clinical Practice (March 2026)

Ng et al. BMC Medical Informatics and Decision Making

(2025) 25:236

Page 14 of 24

Key Findings Novel Features

The study points out the need for systems to automatically flag likely errors in ASR output, which would help reduce the

workload of manual editing and improve transcript quality.

Using medical context to detect and cor rect errors more accurately, especially for

disrupted speech and ambiguous terms, is recommended for enhancing transcrip tion quality​.

Need for automated error detection and correction in transcription, as current

systems often produce clinically significant errors that could impact patient care. The

study also suggests incorporating multiple rounds of annotation and validation pro cesses to improve transcription accuracy. Training on domain-specific terminologies and including more advanced error cor

rection models may mitigate transcription inaccuracies, particularly for complex medical terms​.

NR A substantial number of sentence- and word-level edits in inpatient progress notes can be automatically detected with a small false detection rate; NR

promising results that warrant further exploration, including the procure ment of additional training data.

NR DL system using pretrained word representations as the input, and the

proposed transfer learning technique, is able to achieve better perfor

mance. Transferring knowledge from general deep models to speciic tasks in healthcare helps gain a significant improvement.

AI Transcription

Proficiency (paper

specific outcomes)

Metric (F1 score, F1 0.39

Precision, Recall, WER) 0.44

0.48

Precision 0.35

0.37

0.40

Recall

0.44

0.55

0.59

F1

0.28

0.31

Precision 0.23

0.25

Recall

0.36

0.42

F1

0.416

0.392

Precision 0.498

0.481

Recall

0.419

0.390

Word-level

performance: - Words

- combined external

Gold standard Sentence level

performance: - Structure - Words

- combined VGEENS

Gold standard - Trans_BBN - Trans_I2B2

Comparator Type Subcategories Performance

Standard

Word-level gold

standard labels were deter

mined based

on the keep or delete labels

from the note alignments.

Sentence-level

gold standard

labels were de

termined as fol

lows: sentences

were labeled as

delete when all

word-level labels

were delete and

sentences were

labeled as keep when at least

one word-level

label in the sen

tence was kept. NICTA Syn

thetic Nursing

Handover Data

in written and voice form.

Table 2 (continued)

Study Reference Lybarger et al.,

2018

[18]

Zhou et

al., 2018 [19]

Made with FlippingBook - professional solution for displaying marketing and sales documents online