xRead - Incorporating Artificial Intelligence into Clinical Practice (March 2026)

Ng et al. BMC Medical Informatics and Decision Making

(2025) 25:236

Page 21 of 24

Fig. 3 Stacked bar chart for applicability concerns based on QUADAS-2 tool

of standalone SR tools. While these technologies differ in their underlying architectures, their shared aim is to automate or accelerate clinical documentation by con verting speech into text, extracting medically relevant content, and in some cases summarizing or repurpos ing notes for different uses. Such variety in technologi cal approaches was mirrored by the diversity of clinical settings in which these tools were tested. Some studies relied on synthetic data sets, such as nursing handover records, while others evaluated real-world interactions in high-pressure environments like the ED. This variety demonstrates the adaptability of AI transcription solu tions but also reveals that performance is heavily con text dependent. Tools that excel in structured, repetitive ED workflows may struggle with varied discussions in multi-specialty clinics or with more complex, freeform patient-doctor dialogues. Likewise, whether a study was conducted using real or simulated encounters also influ enced performance, as differences in setting and com plexity affect metrics such as WER or F1 scores. In general, the accuracy of AI-driven transcription remains mixed. WER varied from as low as 0.087 in highly controlled settings (Issenman et al. [12]) to over 50% in conversational or multi-speaker encounters (Kodish-Wachs et al. [17]). Some tools achieved more favorable precision/recall in domain-specific contexts, particularly when leveraging specialty vocabularies (e.g., Happe et al. [10], Suominen et al. [15]). However, others (e.g., Lybarger et al. [18]) highlighted persistent transcrip tion errors that require substantial manual correction. It is, however, also important to interpret performance estimates from older systems cautiously, as rapid tech nological advances—particularly in neural network and transformer-based models—have likely rendered those results outdated or less generalizable to current AI tran scription capabilities. Besides accuracy, studies conveyed mixed evidence concerning time efficiency. Zick et al. and Issenman et al. both reported substantial reductions in documentation

turnaround times [9, 12], whereas newer research from Blackley et al. and Hodgson et al. found negligible or even negative impacts once clinicians’ editing tasks were factored in [16, 20]. Similarly, cost analyses yield no con sensus. Zick et al. posited that voice recognition could be up to 100 times less expensive than manual transcription [9], but Issenman et al. found it to be more costly in a pediatric gastroenterology context [12]. As these exam ples illustrate, site-specific factors—such as the preva lence of templated text, local staff costs, and volume of standard phrases—likely determine the effectiveness of AI transcription. Although AI transcription systems do not directly deliver patient care, they can indirectly influence clini cal outcomes by improving documentation completeness and quality. Almario et al. reported a higher identifica tion of red-flag symptoms in AI-drafted notes [13, 14], while others showed that accurate automated transcrip tion may reduce cognitive load on clinicians. Neverthe less, any potential gains are offset by persistent concerns about error rates. High WER or omissions, as highlighted by Kodish-Wachs et al. [17], remain a threat to real-time decision-making, and this problem has not disappeared with the advent of LLM-based scribes, as seen in Bundy et al., van Buchem et al. and Biro et al. [23, 31, 32]. The subsequent post-editing burden also continues to chal lenge clinicians’ time management, particularly in busy and dynamic outpatient settings with a variety of patient presentations. Moreover, the efficiency gains from AI transcription are not guaranteed and initial investment costs can be prohibitive [37]. Clinicians’ opinion, accep tance and burnout also surfaced as important consider ations for AI adoption. Surveys by Goss et al. [20] and interventions assessed by Misurac et al. [28] revealed that while some clinicians appreciate the potential reduction in documentation burdens, many remain cautious, dis satisfied with high error rates, or concerned about the reliability of AI-generated transcripts.

Made with FlippingBook - professional solution for displaying marketing and sales documents online