Performance of artificial intelligence chatbot as a source of patient information on anti-rheumatic drug use in pregnancy
Artificial intelligence as source of information for anti-rheumatics during pregnancy
Keywords:Anti-rheumatic drugs, artificial intelligence, ChatGPT, pregnancy
Background/Aim: Women with rheumatic and musculoskeletal disorders often discontinue using their medications prior to conception or during the few early weeks of pregnancy because drug use during pregnancy frequently results in anxiety. Pregnant women have reported seeking out health-related information from a variety of sources, particularly the Internet, in an attempt to ease their concerns about the use of such medications during pregnancy. The objective of this study was to evaluate the accuracy and completeness of health-related information concerning the use of anti-rheumatic medications during pregnancy as provided by Open Artificial Intelligence (AI's) Chat Generative Pre-trained Transformer (ChatGPT) versions 3.5 and 4, which are widely known AI tools.
Methods: In this prospective cross-sectional study, the performances of OpenAI's ChatGPT versions 3.5 and 4 were assessed regarding health information concerning anti-rheumatic drugs during pregnancy using the 2016 European Union of Associations for Rheumatology (EULAR) guidelines as a reference. Fourteen queries from the guidelines were entered into both AI models. Responses were evaluated independently and rated by two evaluators using a predefined 6-point Likert-like scale (1 – completely incorrect to 6 – completely correct) and for completeness using a 3-point Likert-like scale (1 – incomplete to 3 – complete). Inter-rater reliability was evaluated using Cohen’s kappa statistic, and the differences in scores across ChatGPT versions were compared using the Mann–Whitney U test.
Results: No statistically significant difference between the mean accuracy scores of GPT versions 3.5 and 4 (5 [1.17] versus 5.07 [1.26]; P=0.769), indicating the resulting scores were between nearly all accurate and correct for both models. Additionally, no statistically significant difference in the mean completeness scores of GPT 3.5 and GPT 4 (2.5 [0.51] vs 2.64 [0.49], P=0.541) was found, indicating scores between adequate and comprehensive for both models. Both models had similar total mean accuracy and completeness scores (3.75 [1.55] versus 3.86 [1.57]; P=0.717). In the GPT 3.5 model, hydroxychloroquine and Leflunomide received the highest full scores for both accuracy and completeness, while methotrexate, Sulfasalazine, Cyclophosphamide, Mycophenolate mofetil, and Tofacitinib received the highest total scores in the GPT 4 model. Nevertheless, for both models, one of the 14 drugs was scored as more incorrect than correct.
Conclusions: When considering the safety and compatibility of anti-rheumatic medications during pregnancy, both ChatGPT versions 3.5 and 4 demonstrated satisfactory accuracy and completeness. On the other hand, the research revealed that the responses generated by ChatGPT also contained inaccurate information. Despite its good performance, ChatGPT should not be used as a standalone tool to make decisions about taking medications during pregnancy due to this AI tool’s limitations.
Cooper GS, Stroehla BC. The epidemiology of autoimmune diseases. Autoimmun Rev. 2003;2(3):119-25. doi: 10.1016/s1568-9972(03)00006-5. DOI: https://doi.org/10.1016/S1568-9972(03)00006-5
Desai RJ, Huybrechts KF, Bateman BT, Hernandez-Diaz S, Mogun H, Gopalakrishnan C, et al. Brief Report: Patterns and Secular Trends in Use of Immunomodulatory Agents During Pregnancy in Women With Rheumatic Conditions. Arthritis Rheumatol. 2016;68(5):1183-9. doi: 10.1002/art.39521. DOI: https://doi.org/10.1002/art.39521
Grimes HA, Forster DA, Newton MS. Sources of information used by women during pregnancy to meet their information needs. Midwifery. 2014;30(1):e26-33. doi: 10.1016/j.midw.2013.10.007. DOI: https://doi.org/10.1016/j.midw.2013.10.007
Serçekuş P, Değirmenciler B, Özkan S. Internet use by pregnant women seeking childbirth information. J Gynecol Obstet Hum Reprod. 2021;50(8):102144. doi: 10.1016/j.jogoh.2021.102144. DOI: https://doi.org/10.1016/j.jogoh.2021.102144
Bramham K, Soh MC, Nelson-Piercy C. Pregnancy and renal outcomes in lupus nephritis: an update and guide to management. Lupus. 2012;21(12):1271-83. doi: 10.1177/0961203312456893. DOI: https://doi.org/10.1177/0961203312456893
Pal S, Bhattacharya M, Lee SS, Chakraborty C. A Domain-Specific Next-Generation Large Language Model (LLM) or ChatGPT is Required for Biomedical Engineering and Research. Ann Biomed Eng. 2023;10. doi: 10.1007/s10439-023-03306-x. DOI: https://doi.org/10.1007/s10439-023-03306-x
Deiana G, Dettori M, Arghittu A, Azara A, Gabutti G, Castiglia P. Artificial Intelligence and Public Health: Evaluating ChatGPT Responses to Vaccination Myths and Misconceptions. Vaccines (Basel). 2023 7;11(7):1217. doi: 10.3390/vaccines11071217. DOI: https://doi.org/10.3390/vaccines11071217
Deng J, Lin Y. The Benefits and Challenges of ChatGPT: An Overview. Frontiers in Computing and Intelligent Systems. 2023;2(2) 81-83. doi: 10.54097/fcis.v2i2.4465. DOI: https://doi.org/10.54097/fcis.v2i2.4465
Götestam SC, Hoeltzenbein M, Tincani A, Fischer-Betz R, Elefant E, Chambers C, et al. The EULAR points to consider for use of antirheumatic drugs before pregnancy, and during pregnancy and lactation. Ann Rheum Dis. 2016;75(5):795-810. doi: 10.1136/annrheumdis-2015-208840. DOI: https://doi.org/10.1136/annrheumdis-2015-208840
Johnson D, Goodman R, Patrinely J, Stone C, Zimmerman E, Donald R, et al. Assessing the Accuracy and Reliability of AI-Generated Medical Responses: An Evaluation of the Chat-GPT Model. Research square. 2023;28:rs.3.rs-2566942. doi: 10.21203/rs.3.rs-2566942/v1. DOI: https://doi.org/10.21203/rs.3.rs-2566942/v1
Olukman M, Parlar A, Orhan CE, Erol A. Gebelerde ilaç kullanımı: Son bir yıllık deneyim. Turkish Journal of Obstetrics and Gynecology. 2006;3(4):255-61. doi:10.17049/ataunihem.499684. DOI: https://doi.org/10.17049/ataunihem.499684
Riley LE, Cahill AG, Beigi R, Savich R, Scade G. Improving safe and effective use of medicines in pregnancy and lactation. American Journal of Perinatology. 2017;34(8):826-32. doi: 10.1055/s-0037-1598070. DOI: https://doi.org/10.1055/s-0037-1598070
Oliveire-Filho A, Veire AES, Silvo RC, Neves STF, Gama TAB, Lima RV, et al. Adverse medicine reactions in high-risk pregnant women. Saudi Pharmaceutical Journal. 2017;25(7):1073-7. doi: 10.1016/j.jsps.2017.01.005. DOI: https://doi.org/10.1016/j.jsps.2017.01.005
Sinclair M, Lagan BM, Dolk H, McCullough J. An assessment of pregnant women’s knowledge and use of the internet for medication safety information and purchase. Journal of Advanced Nursing. 2018;74(1):137-47. doi: 10.1111/jan.13387. DOI: https://doi.org/10.1111/jan.13387
Koyun A, Kesim Sİ. Gebelikte Karar Vermeye İnternetin Etkisi: Sistematik Bir İnceleme. 3. Uluslararası Bilimsel Araştırmalar Kongresi Bildiri Kitabı, 2018: 9-23.
Sabry Abdel-Messih M, Kamel Boulos MN. ChatGPT in Clinical Toxicology. JMIR Med. Educ. 2023;9:e46876. doi: 10.2196/46876. DOI: https://doi.org/10.2196/46876
Yeo YH, Samaan JS, Ng WH, Ting PS, Trivedi H, Vipani A, et al. Assessing the performance of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma. Clin Mol Hepatol. 2023;29(3):721-32. doi: 10.3350/cmh.2023.0089. DOI: https://doi.org/10.3350/cmh.2023.0089
Alhaidry HM, Fatani B, Alrayes JO, Almana AM, Alfhaed NK. ChatGPT in Dentistry: A Comprehensive Review. Cureus. 2023;15(4):e38317. doi: 10.7759/cureus.38317. DOI: https://doi.org/10.7759/cureus.38317
Sharma SC, Ramchandani JP, Thakker A, Lahiri A. ChatGPT in Plastic and Reconstructive Surgery. Indian J Plast Surg. 2023;56(4):320-5. doi: 10.1055/s-0043-1771514. DOI: https://doi.org/10.1055/s-0043-1771514
Jeblick K, Schachtner B, Dexl J, Mittermeier A, Stüber AT, Topalis J, et al. ChatGPT Makes Medicine Easy to Swallow: An Exploratory Case Study on Simplified Radiology Reports. 2022;10.48550/arXiv.2212.14882. DOI: https://doi.org/10.1007/s00330-023-10213-1
Patel SB, Lam K. ChatGPT: the future of discharge summaries? Lancet Digit Health. 2023;5(3):e107-8. doi: 10.1016/S2589-7500(23)00021-3. DOI: https://doi.org/10.1016/S2589-7500(23)00021-3
Alkaissi H, McFarlane SI. Artificial Hallucinations in ChatGPT: Implications in Scientific Writing. Cureus. 2023;15(2):e35179. doi: 10.7759/cureus.35179. DOI: https://doi.org/10.7759/cureus.35179
How to Cite
Copyright (c) 2023 Nurdan Oruçoğlu, Elif Altunel Kılınç
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.