RELEVANCE SCORING OF ARABIC AND ENGLISH WRITTEN ESSAYS WITH DENSE RETRIEVAL
Abstract
Automated Essay Scoring automates the grading process of essays, providing a great advantage for improving the writing proficiency of students. While holistic essay scoring research is prevalent, a noticeable gap exists in scoring essays for specific quality traits. In this thesis, we focus on the relevance trait, which measures the ability of the student to stay on-topic throughout the entire essay. We propose a novel approach for graded relevance scoring of written essays that employs dense encoders. Dense representations of essays at different relevance levels then form clusters in the embeddings space, such that their centroids are potentially separate enough to effectively represent their relevance levels. We hence use the simple 1-Nearest-Neighbor classification over those centroids to determine the relevance level of an unseen essay. We evaluate our approach in both task-specific (training and testing on the same task) and cross-task (testing on unseen tasks) scenarios using English (ASAP) and Arabic (in-house) datasets. For English, our method achieves state-of-the-art performance in the task-specific setting and matches baseline performance in the cross-task setting, while a few-shot analysis shows it reduces labeling costs with only a 9% drop in effectiveness. For Arabic, our approach outperforms the baselines with 5 and 2 points in task-specific and cross-task settings respectively.
DOI/handle
http://hdl.handle.net/10576/66444Collections
- Computing [110 items ]