Can We Build a Search Engine over Spark?

Al-Rasbi, Sara; Elsayed, Tamer

View/Open

Can_We_Build_a_Search_Engine_over_Spark.pdf (1.110Mb)

Date

2020

Author

Al-Rasbi, Sara
Elsayed, Tamer

Metadata

Show full item record

Abstract

Search engines have to deal with a huge amount of data in scalable and efficient ways to produce effective search results. In this paper, we address the problem of building an efficient and scalable experimental search engine over Spark, an in-memory distributed big data processing framework. The proposed system, SparkIR, can serve as a research framework for conducting information retrieval (IR) experiments. SparkIR supports document-based partitioning scheme for indexing and document-at-a-time (DAAT) for query evaluation. Moreover, it offers static pruning (using champion list) to improve the retrieval efficiency. We evaluated the performance of SparkIR using ClueWeb12-B13 collection that contains about 50M English Web pages. Experiments over different subsets of the collection showed that SparkIR exhibits reasonable efficiency and scalability performance overall for both indexing and retrieval.

DOI/handle

http://dx.doi.org/10.1109/ICIoT48696.2020.9089558
http://hdl.handle.net/10576/60887

Collections

Computer Science & Engineering [‎2484‎ items ]