xRead - Incorporating Artificial Intelligence into Clinical Practice (March 2026)
Carnino et al.
Page 3
Methods Data collection
Author Manuscript Author Manuscript Author Manuscript Author Manuscript
Every original article published by JAMA – Oto from January 2022 to November 2022 (88 total articles) was extracted to represent the articles published before ChatGPTs release in November 2022. Alternative article types such as reviews, case reports, opinions, and commentaries were excluded from this analysis due to their varied lengths and structural inconsistencies, which complicated direct comparisons. It’s important to note that this timeline was used as ChatGPT’s release marks a pivotal moment in the adoption of AI for text generation, however many similar programs have also been released since. Every original article published by JAMA – Oto from March 2023 to September 2023 (55 total articles) was extracted to represent the articles published after Chat-GPTs release. Articles from December of 2022 to February of 2023 were not included in the analysis to exclude articles written and/or submitted before the chatbot was available. Free text from each article’s abstract, introduction, methods, results, and discussion were individually entered into ZeroGPT.com, an online application used to estimate the percent of a text generated by AI [18–20]. The percent of text generated by AI from each article was recorded. The country of the corresponding author listed on each manuscript was also extracted. ChatGPT was introduced on November 30, 2022. The means of AI-generated text in each section were compared, stratified by their publication dates being before or after the introduction of ChatGPT. Prior to this comparison, Fligner-Killeen’s test for the homogeneity of variances was performed. This non-parametric approach checks the assumptions for a standard T-Test without assuming the data to be normally distributed. Based on these results, (adjusted) T-Tests were conducted to compare the means. Additionally, the rate of AI-generated text usage based on the country of academic affiliation for the principal investigator (PI) was of interest. The means of detected AI-generated text were compared, stratified by whether the country of academic affiliation for the PI was primarily English-speaking. Fligner-Killeen’s test for the homogeneity of variances was performed, and T-Tests were conducted accordingly based on these results. A significance level of 0.05 was used for all analyses. All analyses utilized R (Version 4.3.1). The overall mean percentage of AI-written text is presented in Table 1. Notably, the results section displayed the highest percentage of estimated AI-written text, at 49.68%, while the discussion section contained the least, at 19.30%. The findings from Fligner-Killeen’s tests are detailed in Tables 2 and 3. It is important to highlight that the discussion section, when categorized by periods before and after the introduction of ChatGPT, showed unequal variances ( p = 0.002), leading to an adjustment of the T-Test using the Welch approximation, as shown in Table 4. Significant differences were observed in the mean percentage of AI-generated text in the abstract, introduction, and discussion sections when comparing averages before and after Chat-GPT’s release, as shown in Fig. 1. Furthermore, significant disparities were found in the mean percentage of AI-generated text within the same sections
Statistical analysis
Results
Eur Arch Otorhinolaryngol . Author manuscript; available in PMC 2025 November 01.
Made with FlippingBook - professional solution for displaying marketing and sales documents online