xRead - Incorporating Artificial Intelligence into Clinical Practice (March 2026)
Ng et al. BMC Medical Informatics and Decision Making
(2025) 25:236
Page 14 of 24
Key Findings Novel Features
The study points out the need for systems to automatically flag likely errors in ASR output, which would help reduce the
workload of manual editing and improve transcript quality.
Using medical context to detect and cor rect errors more accurately, especially for
disrupted speech and ambiguous terms, is recommended for enhancing transcrip tion quality.
Need for automated error detection and correction in transcription, as current
systems often produce clinically significant errors that could impact patient care. The
study also suggests incorporating multiple rounds of annotation and validation pro cesses to improve transcription accuracy. Training on domain-specific terminologies and including more advanced error cor
rection models may mitigate transcription inaccuracies, particularly for complex medical terms.
NR A substantial number of sentence- and word-level edits in inpatient progress notes can be automatically detected with a small false detection rate; NR
promising results that warrant further exploration, including the procure ment of additional training data.
NR DL system using pretrained word representations as the input, and the
proposed transfer learning technique, is able to achieve better perfor
mance. Transferring knowledge from general deep models to speciic tasks in healthcare helps gain a significant improvement.
AI Transcription
Proficiency (paper
specific outcomes)
Metric (F1 score, F1 0.39
Precision, Recall, WER) 0.44
0.48
Precision 0.35
0.37
0.40
Recall
0.44
0.55
0.59
F1
0.28
0.31
Precision 0.23
0.25
Recall
0.36
0.42
F1
0.416
0.392
Precision 0.498
0.481
Recall
0.419
0.390
Word-level
performance: - Words
- combined external
Gold standard Sentence level
performance: - Structure - Words
- combined VGEENS
Gold standard - Trans_BBN - Trans_I2B2
Comparator Type Subcategories Performance
Standard
Word-level gold
standard labels were deter
mined based
on the keep or delete labels
from the note alignments.
Sentence-level
gold standard
labels were de
termined as fol
lows: sentences
were labeled as
delete when all
word-level labels
were delete and
sentences were
labeled as keep when at least
one word-level
label in the sen
tence was kept. NICTA Syn
thetic Nursing
Handover Data
in written and voice form.
Table 2 (continued)
Study Reference Lybarger et al.,
2018
[18]
Zhou et
al., 2018 [19]
Made with FlippingBook - professional solution for displaying marketing and sales documents online