IDRISI-D: Arabic and English Datasets and Benchmarks for Location Mention Disambiguation over Disaster Microblogs
المؤلف | Suwaileh, Reem |
المؤلف | Elsayed, Tamer |
المؤلف | Imran, Muhammad |
تاريخ الإتاحة | 2024-11-05T06:05:18Z |
تاريخ النشر | 2023 |
اسم المنشور | ArabicNLP 2023 - 1st Arabic Natural Language Processing Conference, Proceedings |
المصدر | Scopus |
المعرّف | http://dx.doi.org/10.18653/v1/2023.arabicnlp-1.14 |
الملخص | Extracting and disambiguating geolocation information from social media data enables effective disaster management, as it helps response authorities; for example, locating incidents for planning rescue activities and affected people for evacuation. Nevertheless, the dearth of resources and tools hinders the development and evaluation of Location Mention Disambiguation (LMD) models in the disaster management domain. Consequently, the LMD task is greatly understudied, especially for the low resource languages such as Arabic. To fill this gap, we introduce IDRISI-D, the largest to date English and the first Arabic public LMD datasets. Additionally, we introduce a modified hierarchical evaluation framework that offers a lenient and nuanced evaluation of LMD systems. We further benchmark IDRISI-D datasets using representative baselines and show the competitiveness of BERT-based models. |
راعي المشروع | This work was made possible by the Graduate Sponsorship Research Award (GSRA) #GSRA5-1-0527-18082 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors. |
اللغة | en |
الناشر | Association for Computational Linguistics (ACL) |
الموضوع | Disaster prevention Disasters Hierarchical systems Disaster management Evaluation framework Geolocations Hierarchical evaluation Low resource languages Management domains Micro-blog Rescue activities Social media datum Location |
النوع | Conference |
الصفحات | 158-169 |
الملفات في هذه التسجيلة
هذه التسجيلة تظهر في المجموعات التالية
-
علوم وهندسة الحاسب [2402 items ]