عرض بسيط للتسجيلة

المؤلفNaeem, M. Asif
المؤلفMirza, Farhaan
المؤلفKhan, Habib Ullah
المؤلفSundaram, David
المؤلفJamil, Noreen
المؤلفWeber, Gerald
تاريخ الإتاحة2022-12-28T10:36:43Z
تاريخ النشر2020-10-23
اسم المنشورIEEE Access
المعرّفhttp://dx.doi.org/10.1109/ACCESS.2020.3033464
الاقتباسNaeem, M. A., Mirza, F., Khan, H. U., Sundaram, D., Jamil, N., & Weber, G. (2020). Big Data Velocity Management–From Stream to Warehouse via High Performance Memory Optimized Index Join. IEEE Access, 8, 195370-195384.
الرقم المعياري الدولي للكتاب2169-3536
معرّف المصادر الموحدhttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85102896549&origin=inward
معرّف المصادر الموحدhttp://hdl.handle.net/10576/37715
الملخصEfficient resource optimization is critical to manage the velocity and volume of real-time streaming data in near-real-time data warehousing and business intelligence. This article presents a memory optimisation algorithm for rapidly joining streaming data with persistent master data in order to reduce data latency. Typically during the transformation phase of ETL (Extraction, Transformation, and Loading) a stream of transactional data needs to be joined with master data stored on disk. To implement this process, a semi-stream join operator is commonly used. Most semi-stream join operators cache frequent parts of the master data to improve their performance, this process requires careful distribution of allocated memory among the components of the join operator. This article presents a cache inequality approach to optimise cache size and memory. To test this approach, we present a novel Memory Optimal Index-based Join (MOIJ) algorithm. MOIJ supports many-to-many types of joins and adapts to dynamic streaming data. We also present a cost model for MOIJ and compare the performance with existing algorithms empirically as well as analytically. We envisage the enhanced ability of processing near-real-time streaming data using minimal memory will reduce latency in processing big data and will contribute to the development of highperformance real-time business intelligence systems.
اللغةen
الناشرIEEE
الموضوعBig data
Cache inequality
High volume semi-stream data
Index-based join
Memory optimisation
Near-real-time data warehouse
Performance optimisation
العنوانBig data velocity management-from stream to warehouse via high performance memory optimized index join
النوعArticle
الصفحات195370-195384
رقم المجلد8
dc.accessType Open Access


الملفات في هذه التسجيلة

Thumbnail

هذه التسجيلة تظهر في المجموعات التالية

عرض بسيط للتسجيلة