QU-IR at SemEval 2016 Task 3: Learning to rank on Arabic community question answering forums with word embedding
الملخص
Resorting to community question answering (CQA) websites for finding answers has gained momentum in the past decade with the explosive rate at which social media has been proliferating. With many questions left unanswered on those websites, automatic and smart question answering systems have seen light. One of the main objectives of such systems is to harness the plethora of existing answered questions; hence transforming the problem to finding good answers to newly posed questions from similar previously-answered ones. As SemEval 2016 Task 3 "Community Question Answering" has focused on this problem, we have participated in the Arabic Subtask. Our system has adopted a supervised learning approach in which a learning-to-rank model is trained over data (questions and answers) extracted from Arabic CQA forums using word2vec features generated from that data. Our primary submission achieved a 29.7% improvement over the MAP score of the baseline. Post submission experiments were further conducted to integrate variations of the word2vec features to our system. Integrating covariance word embedding features has raised the the improvement over the baseline to 37.9%. 2016 Association for Computational Linguistics.
المجموعات
- علوم وهندسة الحاسب [2402 items ]