xRead - Incorporating Artificial Intelligence into Clinical Practice (March 2026)

Ng et al. BMC Medical Informatics and Decision Making

(2025) 25:236

Page 18 of 24

Automatic summaries had higher word count and lower lexical diversity.

Key Findings Novel Features

Used both open-label and masked study designs

Two-stage evaluation: ROUGE metric comparison and manual annotation for information recall accuracy

Collaboration between the system and students leads to the best results, with a

decrease in time spent on summarising in combination with a similar quality when compared to manual summarisation.

Study was independently conducted and focused on real-world usability and safety concerns rather than just techni

cal accuracy. AI tools require continuous independent testing, as when tested with real-world scenarios, many errors arose that could compromise patient safety. Given that proprietary AI algorithms

frequently evolve, ongoing safety assess ments are essential.

Mixed-method evaluation including survey feedback and usability scales

Integrated AI-generated SmartSections in EHR workflows

Patients perceived less clinician distraction with AI scribing, but no im provement in perceived engagement

Fine-tuned models significantly outperformed zero-shot models; BART-Large-CNN was best for ED consultations

The study explores the impact of a digital scribe system on the clinical

documentation process, demonstrat ing the use of the system in reducing

summarization time while maintaining summary quality through collabora

tive editing, this study highlights the potential of digital scribe systems

to address the challenges of clinical documentation.

Omission type errors, whereby the tool leaves out key information from

its response, were the most common, which poses safety risks; clinicians may struggle to identify omission errors due to reliance on memory recall.

Standardised evaluation frameworks and real-world testing required to

mitigate AI-related safety concerns.

AI-assisted documentation reduced after-hours work by 30% and increased same-day appointment closure by 9.3%.

AI scribing led to significant time sav ings but had variability across users.

AI Transcription

Proficiency (paper

specific outcomes)

Patient satisfaction and engagement NR No significant dif ference in Patient

Doctor Relationship Questionnaire-9

(PDRQ-9) scores

BART-Large-CNN had

highest performance: ROUGE-1 F1 = 0.49,

ROUGE-2 F1 = 0.23,

ROUGE-L F1 = 0.35

Modified Physician

Documentation Qual

ity Instrument (PDQI 9), overall (IQR):

Manual: 31 (27–33)*

AS edited: 29 (26–33)* AS: 25 (22–28)* P value: <0.001

Errors by ADS (mean) 44 5 5 9.5

and clinician burden NR 20.4% less time

spent in notes per appointment

Time spent on notes, documen tation burden NR Median daily docu

mentation reduced by 6.89 min

Metric (F1 score,

Precision, Recall, WER)

ROUGE-1, ROUGE 2, ROUGE-L

Recall-Oriented

Understudy for

Gisting Evalua

tion-1 F1 score in % (IQR):

47.3 (42.5–56.4)

40.6 (35.0-45.4)

32.3 (27.0-37.4)

P value: <0.001 WER (SD)

2.9 (2.7)

Summarization ac

curacy and recall

Automatic sum

maries (edited by humans) Automatic

summaries

Omission, addi

tion, wrong out

put, misplaced/ irrelevant text

Comparator Type Subcategories Performance Zero-shot versus Fine-tuned models

NR With versus without DAX

Blinded comparison Manual

Comparison of ADS generated notes with expert-re

viewed transcripts

Pre-post intervention Time efficiency Baseline versus AI intervention

Standard

Nurse summary

notes from EMR

Highest scor

ing manual

summary

Expert-reviewed

transcripts from real patient

encounters

Time in notes

per appointment

EHR usage time metrics

Table 2 (continued)

Study Reference Owens et al.,

2024

[29]

Sezgin et

al., 2024 [30]

van Bu

chem et

al., 2024 [31]

Biro et

al., 2025 [32]

Duggan et al.,

2025

[33]

Ma et

al., 2025 [34]

Made with FlippingBook - professional solution for displaying marketing and sales documents online