xRead - Incorporating Artificial Intelligence into Clinical Practice (March 2026)

First page Table of contents Previous page 28 Next page Last page

Ng et al. BMC Medical Informatics and Decision Making

(2025) 25:236

Page 16 of 24

Key Findings Novel Features

There is a need for custom-trained language models, especially for clinical

environments. Custom LMs significantly improve accuracy over generic models, suggesting that transcription accuracy

can be further enhanced by incorporating clinician-specific vocabulary and data. Continuous adaptation of the language

model with newly transcribed data could address current limitations in handling

various speech styles, accents, and clinical terminologies, especially in emergency

settings where rapid and context-specific language use is prevalent.

Utilized a pipeline of AI models, including LLMs and speech-to-text transcription

Explored real-time AI-driven clinical docu mentation using GPT-4.

Highlighted issues such as AI errors, over documentation, and potential physician burnout risks.

Findings contributed to Atrium Health’s decision to expand its use to 2,500 licenses across specialties.

AI scribe adapted to clinician habits over time, specialty-specific notes

Examined CPT code submission rates and documentation timeliness

NR Mozilla DeepSpeech outperforms CMU Sphinx in clinical transcription

accuracy, with notable improvements in WER when using custom-trained language models. Short variable length audio recordings, split on

detected silences, also demonstrate transcription accuracy comparable to full-length recordings; DeepSpeech

offers faster processing times, indicat ing its potential for real-time clinical

applications, although concerns about generalizability and responsiveness remain.

AI documentation improved quality, reduced consultation length by 26.3%, and decreased clinician task load.

NR NR NR Most physicians felt DAX Copilot re duced their workload, especially those who dictated notes after work. The AI tool allowed for more patient engagement during visits.

Not all encounters were suitable for AI documentation, some physicians

found it useful for complex visits, while others preferred it for routine ones.

Errors in transcription and AI-generat ed content required physician review and edits.

Some physicians worried that DAX Co pilot’s efficiency would lead to higher patient loads.

DAX users reported reduced docu mentation stress, improved accuracy and increased patient satisfaction.

AI documentation showed no signifi cant benefit to patient experience or productivity but improved provider engagement.

AI Transcription

Proficiency (paper

specific outcomes)

Clinical documen tation quality NR AI-produced docu

mentation had higher SAIL scores

Productivity metrics NR Time spent per day in Provider en

EMRs decreased from 90.1 to 70.3 min

NR Positive trend in

engagement but

increased after-hours EHR time

Metric (F1 score, WER

Precision, Recall, WER)

CMU Sphinx

0.7 (baseline), 0.41 (trained)

0.76 (baseline),

0.57 (trained)

0.53 (baseline),

0.38 (trained) Mozilla

DeepSpeech

0.48 (baseline),

0.28 (trained)

0.71 (baseline),

0.43 (trained)

0.46 (baseline),

0.28 (trained)

Gold standard - Full length - Short,

fixed-length

- Short, var-length

gagement and

documentation burden

Comparator Type Subcategories Performance

Standard EHR

documentation

Bundy et al., 2024 [23] NR Traditional manual documentation

(dictation, typing)

Pre-DAX versus Post-DAX

Control versus AI-assisted

Standard

Written files

of the dataset

were used for

comparison

Sheffield Assess

ment Instrument for Letters

Time in notes

per visit/week Provider

engagement survey

Table 2 (continued)

Study Reference Van

Woensel et al.,

2022

[22]

Balloch

et al.,

2024 [4]

Cao et

al., 2024 [24]

Harbele et al.,

2024

[25]

Made with FlippingBook - professional solution for displaying marketing and sales documents online