xRead - Incorporating Artificial Intelligence into Clinical Practice (March 2026)
Ng et al. BMC Medical Informatics and Decision Making
(2025) 25:236
Page 16 of 24
Key Findings Novel Features
There is a need for custom-trained language models, especially for clinical
environments. Custom LMs significantly improve accuracy over generic models, suggesting that transcription accuracy
can be further enhanced by incorporating clinician-specific vocabulary and data. Continuous adaptation of the language
model with newly transcribed data could address current limitations in handling
various speech styles, accents, and clinical terminologies, especially in emergency
settings where rapid and context-specific language use is prevalent.
Utilized a pipeline of AI models, including LLMs and speech-to-text transcription
Explored real-time AI-driven clinical docu mentation using GPT-4.
Highlighted issues such as AI errors, over documentation, and potential physician burnout risks.
Findings contributed to Atrium Health’s decision to expand its use to 2,500 licenses across specialties.
AI scribe adapted to clinician habits over time, specialty-specific notes
Examined CPT code submission rates and documentation timeliness
NR Mozilla DeepSpeech outperforms CMU Sphinx in clinical transcription
accuracy, with notable improvements in WER when using custom-trained language models. Short variable length audio recordings, split on
detected silences, also demonstrate transcription accuracy comparable to full-length recordings; DeepSpeech
offers faster processing times, indicat ing its potential for real-time clinical
applications, although concerns about generalizability and responsiveness remain.
AI documentation improved quality, reduced consultation length by 26.3%, and decreased clinician task load.
NR NR NR Most physicians felt DAX Copilot re duced their workload, especially those who dictated notes after work. The AI tool allowed for more patient engagement during visits.
Not all encounters were suitable for AI documentation, some physicians
found it useful for complex visits, while others preferred it for routine ones.
Errors in transcription and AI-generat ed content required physician review and edits.
Some physicians worried that DAX Co pilot’s efficiency would lead to higher patient loads.
DAX users reported reduced docu mentation stress, improved accuracy and increased patient satisfaction.
AI documentation showed no signifi cant benefit to patient experience or productivity but improved provider engagement.
AI Transcription
Proficiency (paper
specific outcomes)
Clinical documen tation quality NR AI-produced docu
mentation had higher SAIL scores
Productivity metrics NR Time spent per day in Provider en
EMRs decreased from 90.1 to 70.3 min
NR Positive trend in
engagement but
increased after-hours EHR time
Metric (F1 score, WER
Precision, Recall, WER)
CMU Sphinx
0.7 (baseline), 0.41 (trained)
0.76 (baseline),
0.57 (trained)
0.53 (baseline),
0.38 (trained) Mozilla
DeepSpeech
0.48 (baseline),
0.28 (trained)
0.71 (baseline),
0.43 (trained)
0.46 (baseline),
0.28 (trained)
Gold standard - Full length - Short,
fixed-length
- Short, var-length
gagement and
documentation burden
Comparator Type Subcategories Performance
Standard EHR
documentation
Bundy et al., 2024 [23] NR Traditional manual documentation
(dictation, typing)
Pre-DAX versus Post-DAX
Control versus AI-assisted
Standard
Written files
of the dataset
were used for
comparison
Sheffield Assess
ment Instrument for Letters
Time in notes
per visit/week Provider
engagement survey
Table 2 (continued)
Study Reference Van
Woensel et al.,
2022
[22]
Balloch
et al.,
2024 [4]
Cao et
al., 2024 [24]
Harbele et al.,
2024
[25]
Made with FlippingBook - professional solution for displaying marketing and sales documents online