ArabicWeb16: A new crawl for today's Arabic Web
Abstract
Web crawls provide valuable snapshots of the Web which enable a wide variety of research, be it distributional analysis to characterize Web properties or use of language, content analysis in social science, or Information Retrieval (IR) research to develop and evaluate effective search algorithms. While many English-centric Web crawls exist, existing public Arabic Web crawls are quite limited, limiting research and development. To remedy this, we present ArabicWeb16, a new public Web crawl of roughly 150M Arabic Web pages with significant coverage of dialectal Arabic as well as Modern Standard Arabic. For IR researchers, we expect ArabicWeb16 to support various research areas: ad-hoc search, question answering, filtering, cross-dialect search, dialect detection, entity search, blog search, and spam detection. 2016 ACM.
Collections
- Computer Science & Engineering [2402 items ]