A scalable solution for finding overlaps between sequences using map-reduce
المؤلف | Haj Rachid, Maan |
المؤلف | Malluhi, Qutaibah M. |
تاريخ الإتاحة | 2021-06-24T06:47:09Z |
تاريخ النشر | 2016 |
اسم المنشور | Proceedings of the 8th International Conference on Bioinformatics and Computational Biology, BICOB 2016 |
المصدر | Scopus |
الملخص | The overlap stage of a string graph-based assembler is considered one of the most time- and space-consuming stages in any de novo overlap-based assembler. This is due to the huge output of the next-generation sequencing technology which is represented by hundreds of millions of reads. In this study, we take advantage of the MapReduce programming model to find the overlaps between sequences. The proposed solution is scalable and can handle huge input and output sizes that can not be handled in existing solutions. The solution achieves perfect linear performance scalability with increased number of processing nodes for huge data sets. The method optimizes the output size by reporting a string representing a suffix-prefix match once, even if this string is involved in multiple matches. Running the algorithm in an Amazon cloud environment has demonstrated substantially lower cost than using other state of the art techniques for solving the same problem. The solution has been implemented as a tool that is freely available for the research community. Copyright ISCA. |
اللغة | en |
الناشر | The International Society for Computers and Their Applications (ISCA) |
الموضوع | All-pairs suffix prefix Bioinformatics Map-reduce Sequence analysis |
النوع | Conference Paper |
الصفحات | 77-82 |
الملفات في هذه التسجيلة
الملفات | الحجم | الصيغة | العرض |
---|---|---|---|
لا توجد ملفات لها صلة بهذه التسجيلة. |
هذه التسجيلة تظهر في المجموعات التالية
-
الابحاث المتعددة التخصصات والتصاميم االذكية [15 items ]