ChatGPT’s scorecard after the performance in a series of tests conducted at the multi-country level: A pattern of responses of generative artificial intelligence or large language models

Manojit, Bhattacharya; Pal, Soumen; Chatterjee, Srijan; Alshammari, Abdulrahman; Albekairi, Thamer H.; Jagga, Supriya; Ige Ohimain, Elijah; Zayed, Hatem; Byrareddy, Siddappa N.; Lee, Sang-Soo; Wen, Zhi-Hong; Agoramoorthy, Govindasamy; Bhattacharya, Prosun; Chakraborty, Chiranjib

عرض / فتح

تحقق من خيارات الوصول

1-s2.0-S2590262824000200-main.pdf (7.298Mb)

التاريخ

2024-03-02

المؤلف

Manojit, Bhattacharya
Pal, Soumen
Chatterjee, Srijan
Alshammari, Abdulrahman
Albekairi, Thamer H.
...show more authors ...show less authors

البيانات الوصفية

عرض كامل للتسجيلة

الملخص

Recently, researchers have shown concern about the ChatGPT-derived answers. Here, we conducted a series of tests using ChatGPT by individual researcher at multi-country level to understand the pattern of its answer accuracy, reproducibility, answer length, plagiarism, and in-depth using two questionnaires (the first set with 15 MCQs and the second 15 KBQ). Among 15 MCQ-generated answers, 13 ± 70 were correct (Median : 82.5; Coefficient variance : 4.85), 3 ± 0.77 were incorrect (Median: 3, Coefficient variance: 25.81), and 1 to 10 were reproducible, and 11 to 15 were not. Among 15 KBQ, the length of each question (in words) is about 294.5 ± 97.60 (mean range varies from 138.7 to 438.09), and the mean similarity index (in words) is about 29.53 ± 11.40 (Coefficient variance: 38.62) for each question. The statistical models were also developed using analyzed parameters of answers. The study shows a pattern of ChatGPT-derive answers with correctness and incorrectness and urges for an error-free, next-generation LLM to avoid users’ misguidance.