عرض بسيط للتسجيلة

المؤلفAsif Naeem, M.
المؤلفKhan, Habib Ullah
المؤلفAslam, Saad
المؤلفJamil, Noreen
تاريخ الإتاحة2022-12-29T05:47:27Z
تاريخ النشر2020-08-12
اسم المنشورElectronics (Switzerland)
المعرّفhttp://dx.doi.org/10.3390/electronics9081299
الاقتباسNaeem, M. A., Khan, H. U., Aslam, S., & Jamil, N. (2020). Parallelisation of a Cache-Based Stream-Relation Join for a Near-Real-Time Data Warehouse. Electronics, 9(8), 1299.
معرّف المصادر الموحدhttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85090373685&origin=inward
معرّف المصادر الموحدhttp://hdl.handle.net/10576/37769
الملخصNear real-time data warehousing is an important area of research, as business organisations want to analyse their businesses sales with minimal latency. Therefore, sales data generated by data sources need to reflect immediately in the data warehouse. This requires near-real-time transformation of the stream of sales data with a disk-based relation called master data in the staging area. For this purpose, a stream-relation join is required. The main problem in stream-relation joins is the different nature of inputs; stream data is fast and bursty, whereas the disk-based relation is slow due to high disk I/O cost. To resolve this problem, a famous algorithm CACHEJOIN (cache join) was published in the literature. The algorithm has two phases, the disk-probing phase and the stream-probing phase. These two phases execute sequentially; that means stream tuples wait unnecessarily due to the sequential execution of both phases. This limits the algorithm to exploiting CPU resources optimally. In this paper, we address this issue by presenting a robust algorithm called PCSRJ (parallelised cache-based stream relation join). The new algorithm enables the execution of both disk-probing and stream-probing phases of CACHEJOIN in parallel. The algorithm distributes the disk-based relation on two separate nodes and enables parallel execution of CACHEJOIN on each node. The algorithm also implements a strategy of splitting the stream data on each node depending on the relevant part of the relation. We developed a cost model for PCSRJ and validated it empirically. We compared the service rates of both algorithms using a synthetic dataset. Our experiments showed that PCSRJ significantly outperforms CACHEJOIN.
اللغةen
الناشرMDPI
الموضوعDate warehousing
Parallelisation
Performance evaluation
Semi-stream data
Semi-stream join
العنوانParallelisation of a cache-based stream-relation join for a near-real-time data warehouse
النوعArticle
رقم العدد8
رقم المجلد9
ESSN2079-9292
dc.accessType Open Access


الملفات في هذه التسجيلة

Thumbnail

هذه التسجيلة تظهر في المجموعات التالية

عرض بسيط للتسجيلة