ChatGPT’s scorecard after the performance in a series of tests conducted at the multi-country level: A pattern of responses of generative artificial intelligence or large language models

Manojit, Bhattacharya; Pal, Soumen; Chatterjee, Srijan; Alshammari, Abdulrahman; Albekairi, Thamer H.; Jagga, Supriya; Ige Ohimain, Elijah; Zayed, Hatem; Byrareddy, Siddappa N.; Lee, Sang-Soo; Wen, Zhi-Hong; Agoramoorthy, Govindasamy; Bhattacharya, Prosun; Chakraborty, Chiranjib

المؤلف	Manojit, Bhattacharya
المؤلف	Pal, Soumen
المؤلف	Chatterjee, Srijan
المؤلف	Alshammari, Abdulrahman
المؤلف	Albekairi, Thamer H.
المؤلف	Jagga, Supriya
المؤلف	Ige Ohimain, Elijah
المؤلف	Zayed, Hatem
المؤلف	Byrareddy, Siddappa N.
المؤلف	Lee, Sang-Soo
المؤلف	Wen, Zhi-Hong
المؤلف	Agoramoorthy, Govindasamy
المؤلف	Bhattacharya, Prosun
المؤلف	Chakraborty, Chiranjib
تاريخ الإتاحة	2024-06-12T10:59:04Z
تاريخ النشر	2024-03-02
اسم المنشور	Current Research in Biotechnology
المعرّف	http://dx.doi.org/10.1016/j.crbiot.2024.100194
الاقتباس	Bhattacharya, M., Pal, S., Chatterjee, S., Alshammari, A., Albekairi, T. H., Jagga, S., ... & Chakraborty, C. (2024). ChatGPT’s scorecard after the performance in a series of tests conducted at the multi-university level: A pattern of responses of generative artificial intelligence or large language models. Current Research in Biotechnology, 100194.
معرّف المصادر الموحد	https://www.sciencedirect.com/science/article/pii/S2590262824000200
معرّف المصادر الموحد	http://hdl.handle.net/10576/56120
الملخص	Recently, researchers have shown concern about the ChatGPT-derived answers. Here, we conducted a series of tests using ChatGPT by individual researcher at multi-country level to understand the pattern of its answer accuracy, reproducibility, answer length, plagiarism, and in-depth using two questionnaires (the first set with 15 MCQs and the second 15 KBQ). Among 15 MCQ-generated answers, 13 ± 70 were correct (Median : 82.5; Coefficient variance : 4.85), 3 ± 0.77 were incorrect (Median: 3, Coefficient variance: 25.81), and 1 to 10 were reproducible, and 11 to 15 were not. Among 15 KBQ, the length of each question (in words) is about 294.5 ± 97.60 (mean range varies from 138.7 to 438.09), and the mean similarity index (in words) is about 29.53 ± 11.40 (Coefficient variance: 38.62) for each question. The statistical models were also developed using analyzed parameters of answers. The study shows a pattern of ChatGPT-derive answers with correctness and incorrectness and urges for an error-free, next-generation LLM to avoid users’ misguidance.
راعي المشروع	This work was funded the by Researchers Supporting Project number (RSP2024R491), King Saud University, Riyadh, Saudi Arabia.
اللغة	en
الناشر	Elsevier
الموضوع	ChatGPT Accuracy Reproducibility Plagiarism Answer length
العنوان	ChatGPT’s scorecard after the performance in a series of tests conducted at the multi-country level: A pattern of responses of generative artificial intelligence or large language models
النوع	Article
رقم المجلد	7
Open Access user License	http://creativecommons.org/licenses/by-nc-nd/4.0/
ESSN	2590-2628
dc.accessType	Open Access

اصدار الناشر (متاح للجميع Icon

)

الملفات في هذه التسجيلة

الاسم:: 1-s2.0-S2590262824000200-main.pdf
الحجم:: 7.298Mb
الصيغة:: PDF

عرض / فتح

هذه التسجيلة تظهر في المجموعات التالية

العلوم الحيوية الطبية [‎768‎ items ]

عرض بسيط للتسجيلة

ChatGPT’s scorecard after the performance in a series of tests conducted at the multi-country level: A pattern of responses of generative artificial intelligence or large language models

الملفات في هذه التسجيلة

هذه التسجيلة تظهر في المجموعات التالية

Video