Performance of artificial intelligence chatbot as a source of patient information on anti-rheumatic drug use in pregnancy

Artificial intelligence as source of information for anti-rheumatics during pregnancy



Anti-rheumatic drugs, artificial intelligence, ChatGPT, pregnancy


Background/Aim: Women with rheumatic and musculoskeletal disorders often discontinue using their medications prior to conception or during the few early weeks of pregnancy because drug use during pregnancy frequently results in anxiety. Pregnant women have reported seeking out health-related information from a variety of sources, particularly the Internet, in an attempt to ease their concerns about the use of such medications during pregnancy. The objective of this study was to evaluate the accuracy and completeness of health-related information concerning the use of anti-rheumatic medications during pregnancy as provided by Open Artificial Intelligence (AI's) Chat Generative Pre-trained Transformer (ChatGPT) versions 3.5 and 4, which are widely known AI tools.

Methods: In this prospective cross-sectional study, the performances of OpenAI's ChatGPT versions 3.5 and 4 were assessed regarding health information concerning anti-rheumatic drugs during pregnancy using the 2016 European Union of Associations for Rheumatology (EULAR) guidelines as a reference. Fourteen queries from the guidelines were entered into both AI models. Responses were evaluated independently and rated by two evaluators using a predefined 6-point Likert-like scale (1 – completely incorrect to 6 – completely correct) and for completeness using a 3-point Likert-like scale (1 – incomplete to 3 – complete). Inter-rater reliability was evaluated using Cohen’s kappa statistic, and the differences in scores across ChatGPT versions were compared using the Mann–Whitney U test.

Results: No statistically significant difference between the mean accuracy scores of GPT versions 3.5 and 4 (5 [1.17] versus 5.07 [1.26]; P=0.769), indicating the resulting scores were between nearly all accurate and correct for both models. Additionally, no statistically significant difference in the mean completeness scores of GPT 3.5 and GPT 4 (2.5 [0.51] vs 2.64 [0.49], P=0.541) was found, indicating scores between adequate and comprehensive for both models. Both models had similar total mean accuracy and completeness scores (3.75 [1.55] versus 3.86 [1.57]; P=0.717). In the GPT 3.5 model, hydroxychloroquine and Leflunomide received the highest full scores for both accuracy and completeness, while methotrexate, Sulfasalazine, Cyclophosphamide, Mycophenolate mofetil, and Tofacitinib received the highest total scores in the GPT 4 model. Nevertheless, for both models, one of the 14 drugs was scored as more incorrect than correct.

Conclusions: When considering the safety and compatibility of anti-rheumatic medications during pregnancy, both ChatGPT versions 3.5 and 4 demonstrated satisfactory accuracy and completeness. On the other hand, the research revealed that the responses generated by ChatGPT also contained inaccurate information. Despite its good performance, ChatGPT should not be used as a standalone tool to make decisions about taking medications during pregnancy due to this AI tool’s limitations.


Download data is not yet available.


Cooper GS, Stroehla BC. The epidemiology of autoimmune diseases. Autoimmun Rev. 2003;2(3):119-25. doi: 10.1016/s1568-9972(03)00006-5. DOI:

Desai RJ, Huybrechts KF, Bateman BT, Hernandez-Diaz S, Mogun H, Gopalakrishnan C, et al. Brief Report: Patterns and Secular Trends in Use of Immunomodulatory Agents During Pregnancy in Women With Rheumatic Conditions. Arthritis Rheumatol. 2016;68(5):1183-9. doi: 10.1002/art.39521. DOI:

Grimes HA, Forster DA, Newton MS. Sources of information used by women during pregnancy to meet their information needs. Midwifery. 2014;30(1):e26-33. doi: 10.1016/j.midw.2013.10.007. DOI:

Serçekuş P, Değirmenciler B, Özkan S. Internet use by pregnant women seeking childbirth information. J Gynecol Obstet Hum Reprod. 2021;50(8):102144. doi: 10.1016/j.jogoh.2021.102144. DOI:

Bramham K, Soh MC, Nelson-Piercy C. Pregnancy and renal outcomes in lupus nephritis: an update and guide to management. Lupus. 2012;21(12):1271-83. doi: 10.1177/0961203312456893. DOI:

Pal S, Bhattacharya M, Lee SS, Chakraborty C. A Domain-Specific Next-Generation Large Language Model (LLM) or ChatGPT is Required for Biomedical Engineering and Research. Ann Biomed Eng. 2023;10. doi: 10.1007/s10439-023-03306-x. DOI:

Deiana G, Dettori M, Arghittu A, Azara A, Gabutti G, Castiglia P. Artificial Intelligence and Public Health: Evaluating ChatGPT Responses to Vaccination Myths and Misconceptions. Vaccines (Basel). 2023 7;11(7):1217. doi: 10.3390/vaccines11071217. DOI:

Deng J, Lin Y. The Benefits and Challenges of ChatGPT: An Overview. Frontiers in Computing and Intelligent Systems. 2023;2(2) 81-83. doi: 10.54097/fcis.v2i2.4465. DOI:

Götestam SC, Hoeltzenbein M, Tincani A, Fischer-Betz R, Elefant E, Chambers C, et al. The EULAR points to consider for use of antirheumatic drugs before pregnancy, and during pregnancy and lactation. Ann Rheum Dis. 2016;75(5):795-810. doi: 10.1136/annrheumdis-2015-208840. DOI:

Johnson D, Goodman R, Patrinely J, Stone C, Zimmerman E, Donald R, et al. Assessing the Accuracy and Reliability of AI-Generated Medical Responses: An Evaluation of the Chat-GPT Model. Research square. 2023; doi: 10.21203/ DOI:

Olukman M, Parlar A, Orhan CE, Erol A. Gebelerde ilaç kullanımı: Son bir yıllık deneyim. Turkish Journal of Obstetrics and Gynecology. 2006;3(4):255-61. doi:10.17049/ataunihem.499684. DOI:

Riley LE, Cahill AG, Beigi R, Savich R, Scade G. Improving safe and effective use of medicines in pregnancy and lactation. American Journal of Perinatology. 2017;34(8):826-32. doi: 10.1055/s-0037-1598070. DOI:

Oliveire-Filho A, Veire AES, Silvo RC, Neves STF, Gama TAB, Lima RV, et al. Adverse medicine reactions in high-risk pregnant women. Saudi Pharmaceutical Journal. 2017;25(7):1073-7. doi: 10.1016/j.jsps.2017.01.005. DOI:

Sinclair M, Lagan BM, Dolk H, McCullough J. An assessment of pregnant women’s knowledge and use of the internet for medication safety information and purchase. Journal of Advanced Nursing. 2018;74(1):137-47. doi: 10.1111/jan.13387. DOI:

Koyun A, Kesim Sİ. Gebelikte Karar Vermeye İnternetin Etkisi: Sistematik Bir İnceleme. 3. Uluslararası Bilimsel Araştırmalar Kongresi Bildiri Kitabı, 2018: 9-23.

Sabry Abdel-Messih M, Kamel Boulos MN. ChatGPT in Clinical Toxicology. JMIR Med. Educ. 2023;9:e46876. doi: 10.2196/46876. DOI:

Yeo YH, Samaan JS, Ng WH, Ting PS, Trivedi H, Vipani A, et al. Assessing the performance of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma. Clin Mol Hepatol. 2023;29(3):721-32. doi: 10.3350/cmh.2023.0089. DOI:

Alhaidry HM, Fatani B, Alrayes JO, Almana AM, Alfhaed NK. ChatGPT in Dentistry: A Comprehensive Review. Cureus. 2023;15(4):e38317. doi: 10.7759/cureus.38317. DOI:

Sharma SC, Ramchandani JP, Thakker A, Lahiri A. ChatGPT in Plastic and Reconstructive Surgery. Indian J Plast Surg. 2023;56(4):320-5. doi: 10.1055/s-0043-1771514. DOI:

Jeblick K, Schachtner B, Dexl J, Mittermeier A, Stüber AT, Topalis J, et al. ChatGPT Makes Medicine Easy to Swallow: An Exploratory Case Study on Simplified Radiology Reports. 2022;10.48550/arXiv.2212.14882. DOI:

Patel SB, Lam K. ChatGPT: the future of discharge summaries? Lancet Digit Health. 2023;5(3):e107-8. doi: 10.1016/S2589-7500(23)00021-3. DOI:

Alkaissi H, McFarlane SI. Artificial Hallucinations in ChatGPT: Implications in Scientific Writing. Cureus. 2023;15(2):e35179. doi: 10.7759/cureus.35179. DOI:






Research Article

How to Cite

Oruçoğlu N, Altunel Kılınç E. Performance of artificial intelligence chatbot as a source of patient information on anti-rheumatic drug use in pregnancy: Artificial intelligence as source of information for anti-rheumatics during pregnancy. J Surg Med [Internet]. 2023 Oct. 4 [cited 2024 May 25];7(10):651-5. Available from: