Show simple item record

AuthorHaj Rachid, Maan
AuthorMalluhi, Qutaibah M.
Available date2021-06-24T06:47:09Z
Publication Date2016
Publication NameProceedings of the 8th International Conference on Bioinformatics and Computational Biology, BICOB 2016
ResourceScopus
URIhttp://hdl.handle.net/10576/20840
AbstractThe overlap stage of a string graph-based assembler is considered one of the most time- and space-consuming stages in any de novo overlap-based assembler. This is due to the huge output of the next-generation sequencing technology which is represented by hundreds of millions of reads. In this study, we take advantage of the MapReduce programming model to find the overlaps between sequences. The proposed solution is scalable and can handle huge input and output sizes that can not be handled in existing solutions. The solution achieves perfect linear performance scalability with increased number of processing nodes for huge data sets. The method optimizes the output size by reporting a string representing a suffix-prefix match once, even if this string is involved in multiple matches. Running the algorithm in an Amazon cloud environment has demonstrated substantially lower cost than using other state of the art techniques for solving the same problem. The solution has been implemented as a tool that is freely available for the research community. Copyright ISCA.
Languageen
PublisherThe International Society for Computers and Their Applications (ISCA)
SubjectAll-pairs suffix prefix
Bioinformatics
Map-reduce
Sequence analysis
TitleA scalable solution for finding overlaps between sequences using map-reduce
TypeConference Paper
Pagination77-82


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record