ChatGPT’s scorecard after the performance in a series of tests conducted at the multi-country level: A pattern of responses of generative artificial intelligence or large language models

Manojit, Bhattacharya; Pal, Soumen; Chatterjee, Srijan; Alshammari, Abdulrahman; Albekairi, Thamer H.; Jagga, Supriya; Ige Ohimain, Elijah; Zayed, Hatem; Byrareddy, Siddappa N.; Lee, Sang-Soo; Wen, Zhi-Hong; Agoramoorthy, Govindasamy; Bhattacharya, Prosun; Chakraborty, Chiranjib

Author	Manojit, Bhattacharya
Author	Pal, Soumen
Author	Chatterjee, Srijan
Author	Alshammari, Abdulrahman
Author	Albekairi, Thamer H.
Author	Jagga, Supriya
Author	Ige Ohimain, Elijah
Author	Zayed, Hatem
Author	Byrareddy, Siddappa N.
Author	Lee, Sang-Soo
Author	Wen, Zhi-Hong
Author	Agoramoorthy, Govindasamy
Author	Bhattacharya, Prosun
Author	Chakraborty, Chiranjib
Available date	2024-06-12T10:59:04Z
Publication Date	2024-03-02
Publication Name	Current Research in Biotechnology
Identifier	http://dx.doi.org/10.1016/j.crbiot.2024.100194
Citation	Bhattacharya, M., Pal, S., Chatterjee, S., Alshammari, A., Albekairi, T. H., Jagga, S., ... & Chakraborty, C. (2024). ChatGPT’s scorecard after the performance in a series of tests conducted at the multi-university level: A pattern of responses of generative artificial intelligence or large language models. Current Research in Biotechnology, 100194.
URI	https://www.sciencedirect.com/science/article/pii/S2590262824000200
URI	http://hdl.handle.net/10576/56120
Abstract	Recently, researchers have shown concern about the ChatGPT-derived answers. Here, we conducted a series of tests using ChatGPT by individual researcher at multi-country level to understand the pattern of its answer accuracy, reproducibility, answer length, plagiarism, and in-depth using two questionnaires (the first set with 15 MCQs and the second 15 KBQ). Among 15 MCQ-generated answers, 13 ± 70 were correct (Median : 82.5; Coefficient variance : 4.85), 3 ± 0.77 were incorrect (Median: 3, Coefficient variance: 25.81), and 1 to 10 were reproducible, and 11 to 15 were not. Among 15 KBQ, the length of each question (in words) is about 294.5 ± 97.60 (mean range varies from 138.7 to 438.09), and the mean similarity index (in words) is about 29.53 ± 11.40 (Coefficient variance: 38.62) for each question. The statistical models were also developed using analyzed parameters of answers. The study shows a pattern of ChatGPT-derive answers with correctness and incorrectness and urges for an error-free, next-generation LLM to avoid users’ misguidance.
Sponsor	This work was funded the by Researchers Supporting Project number (RSP2024R491), King Saud University, Riyadh, Saudi Arabia.
Language	en
Publisher	Elsevier
Subject	ChatGPT Accuracy Reproducibility Plagiarism Answer length
Title	ChatGPT’s scorecard after the performance in a series of tests conducted at the multi-country level: A pattern of responses of generative artificial intelligence or large language models
Type	Article
Volume Number	7
Open Access user License	http://creativecommons.org/licenses/by-nc-nd/4.0/
ESSN	2590-2628
dc.accessType	Open Access

Check access options

Files in this item

Name:: 1-s2.0-S2590262824000200-main.pdf
Size:: 7.298Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Biomedical Sciences [‎843‎ items ]

Show simple item record

ChatGPT’s scorecard after the performance in a series of tests conducted at the multi-country level: A pattern of responses of generative artificial intelligence or large language models

Files in this item

This item appears in the following Collection(s)

Video