Publication:
Evaluating the Competence of AI Chatbots in Answering Patient-Oriented Frequently Asked Questions on Orthognathic Surgery

No Thumbnail Available

Date

2025

Journal Title

Journal ISSN

Volume Title

Publisher

Multidisciplinary Digital Publishing Institute (MDPI)

Research Projects

Organizational Units

Journal Issue

Abstract

Objectives: This study aimed to evaluate the performance of three widely used artificial intelligence (AI) chatbots—ChatGPT-4, Gemini 2.5 Pro, and Claude Sonnet 4—in answering patient-oriented frequently asked questions (FAQs) related to orthognathic surgery. Given the increasing reliance on AI tools in healthcare, it is essential to evaluate their performance to provide accurate, empathetic, readable, and clinically appropriate information. Methods: Twenty FAQs in Turkish about orthognathic surgery were presented to each chatbot. The responses were evaluated by three oral and maxillofacial surgeons using a modified Global Quality Score (GQS), binary clinical appropriateness judgment, and a five-point empathy rating scale. The evaluation process was conducted in a double-blind manner. The Ateşman Readability Formula was applied to each response using an automated Python-based script. Comparative statistical analyses—including ANOVA, Kruskal–Wallis, and post hoc tests—were used to determine significant differences in performance among chatbots. Results: Gemini outperformed both GPT-4 and Claude in GQS, empathy, and clinical appropriateness (p < 0.001). GPT-4 demonstrated the highest readability scores (p < 0.001) but frequently lacked empathetic tone and safety-oriented guidance. Claude showed moderate performance, balancing ethical caution with limited linguistic clarity. A moderate positive correlation was found between empathy and perceived response quality (r = 0.454, p = 0.044). Conclusions: AI chatbots vary significantly in their ability to support surgical patient education. While GPT-4 offers superior readability, Gemini provides the most balanced and clinically reliable responses. These findings underscore the importance of context-specific chatbot selection and continuous clinical oversight to ensure safe and ethical AI-driven communication. © 2025 Elsevier B.V., All rights reserved.

Description

Keywords

Citation

Endorsement

Review

Supplemented By

Referenced By