A scalable solution for finding overlaps between sequences using map-reduce
Author | Haj Rachid, Maan |
Author | Malluhi, Qutaibah M. |
Available date | 2021-06-24T06:47:09Z |
Publication Date | 2016 |
Publication Name | Proceedings of the 8th International Conference on Bioinformatics and Computational Biology, BICOB 2016 |
Resource | Scopus |
Abstract | The overlap stage of a string graph-based assembler is considered one of the most time- and space-consuming stages in any de novo overlap-based assembler. This is due to the huge output of the next-generation sequencing technology which is represented by hundreds of millions of reads. In this study, we take advantage of the MapReduce programming model to find the overlaps between sequences. The proposed solution is scalable and can handle huge input and output sizes that can not be handled in existing solutions. The solution achieves perfect linear performance scalability with increased number of processing nodes for huge data sets. The method optimizes the output size by reporting a string representing a suffix-prefix match once, even if this string is involved in multiple matches. Running the algorithm in an Amazon cloud environment has demonstrated substantially lower cost than using other state of the art techniques for solving the same problem. The solution has been implemented as a tool that is freely available for the research community. Copyright ISCA. |
Language | en |
Publisher | The International Society for Computers and Their Applications (ISCA) |
Subject | All-pairs suffix prefix Bioinformatics Map-reduce Sequence analysis |
Type | Conference Paper |
Pagination | 77-82 |
Files in this item
Files | Size | Format | View |
---|---|---|---|
There are no files associated with this item. |
This item appears in the following Collection(s)
-
Interdisciplinary & Smart Design [15 items ]