xRead - Incorporating Artificial Intelligence into Clinical Practice (March 2026)
Ng et al. BMC Medical Informatics and Decision Making
(2025) 25:236
Page 22 of 24
Several recurring challenges in this space require further attention. First, transcription accuracy often degrades with longer or more complex audio, which suggests the need for incremental or real-time correc tion features. Second, accented or non-native speech frequently leads to transcription mistakes, highlight ing a need for accent adaptation or multi-accent train ing modules [11, 20, 22, 38]. While systems like AEGIS and NOMINDEX demonstrated high accuracy in spe cific clinical environments [39], their performance may not generalize well across diverse settings, particularly where speech patterns differ. This is especially true in multinational healthcare systems or regions with a high percentage of non-native English speakers. Third, the training of specialty-specific AI models can be hampered by privacy concerns, as clinicians and institutions may be reluctant to share sensitive patient transcripts for model fine-tuning. Fourth, only a minority of tools currently offer robust real-time error correction, meaning that any short-term gains in typing speed may be negated by lengthy revision processes. Beyond technical refinement, the limited pace of AI transcription adoption in health care might reflect deep structural barriers, including regulatory scrutiny over patient safety, a fragmented EHR environment that impedes easy integration, and unclear financial incentives. Moreover, as some studies have indi cated (e.g., Issenman et al. [12]), frustrated or unreceptive physicians may be unwilling to incorporate new docu mentation technologies, especially if these tools require significant training or produce large volumes of errors. Further progress will depend on resolving issues of accu racy, accent variability, system interoperability and cost. Future research should also incorporate advanced evalu ation metrics, like ROUGE (Recall-Oriented Understudy for Gisting Evaluation) and BLEU (Bilingual Evaluation Understudy) [40], to systematically assess the quality of AI-generated summaries beyond simple WER. It might be worth mentioning that workflows, already in limited clinical use (e.g., Tortus, Heidi) in the UK since early 2025 [41], may represent the logical next step for AI documentation: integrating LLMs to summarize and repurpose transcripts without requiring pristine input accuracy. Transcription, in this sense, is merely a tran sitional stage toward more comprehensive AI scribe solutions that ultimately address patient-clinician inter actions holistically. Limitations of review This review is not without limitations. Firstly, this review searched three major databases (Medline, Embase and Cochrane Library) as well as grey literature but did not consult IEEE Xplore, which is a key database for engi neering and technology-related research, including AI and machine learning. This exclusion may have limited
the review’s ability to capture relevant studies on AI tran scription systems, particularly those focused on techni cal innovations in SR and NLP (although they may not be applied in health care). Secondly, most of the studies included in this review were short-term evaluations or proof-of-concept studies conducted in controlled envi ronments or with small sample sizes. There is a lack of long-term, real-world data on the sustained use of AI transcription tools in clinical practice. As a result, this review cannot fully assess the long-term impact of AI transcription on clinician efficiency, patient care out comes or system-wide healthcare improvements. Thirdly, given the narrative synthesis approach, this review lacked the ability to draw strong, statistically powered conclu sions about the overall effectiveness of AI transcription tools. Lastly, the review focused primarily on outcomes such as accuracy, time savings and clinician satisfaction, without addressing other potentially important dimen sions, such as cost-effectiveness, user training require ments or implementation barriers. These additional factors could significantly affect the adoption and suc cess of AI transcription tools in clinical practice, but they were not consistently reported in the studies reviewed. Conclusions In conclusion, this systematic review revealed that AI SR and transcription software has certain potential to improve clinical documentation, enhance workflow effi ciency and reduce the documentation burden on clini cians. The tools designed for specific medical domains can achieve high levels of accuracy, as evidenced by sys tems like AEGIS and NOMINDEX, which outperformed manual documentation. However, there was significant variability in the performance of AI SR and transcrip tion tools across different software platforms and clinical environments, with general-purpose SR systems often producing high error rates and requiring time-consum ing manual corrections. This variability highlights that AI transcription software is still in a developmental phase, with much room for refinement, particularly in adapting systems to accents and complex medical language and improving real-time error correction before widespread adoption can be achieved. Future work should also expand the scope beyond transcription alone—explor ing end-to-end AI scribe capabilities and evaluating their real-world effectiveness. Supplementary Information The online version contains supplementary material available at https://doi.or g/10.1186/s12911-025-03061-0.
Supplementary Material 1
Made with FlippingBook - professional solution for displaying marketing and sales documents online