A Scene-to-Speech Mobile based Application: Multiple Trained Models Approach

Karkar A.; Kunhoth J.; Al-Maadeed, Somaya

المؤلف	Karkar A.
المؤلف	Kunhoth J.
المؤلف	Al-Maadeed, Somaya
تاريخ الإتاحة	2022-05-19T10:23:11Z
تاريخ النشر	2020
اسم المنشور	2020 IEEE International Conference on Informatics, IoT, and Enabling Technologies, ICIoT 2020
المصدر	Scopus
المعرّف	http://dx.doi.org/10.1109/ICIoT48696.2020.9089557
معرّف المصادر الموحد	http://hdl.handle.net/10576/31123
الملخص	The concept of Scene-to-Speech (STS) is to recognize elements in a captured image or a video clip to speak loudly an informative textual content that describes the scene. The contemporary progression in convolution neural network (CNN) allows us to attain object recognition procedures, in real-time, on mobile handled devices. Considerable number of applications has been developed to perform object recognition in scenes and say loudly their relevant descriptive messages. However, the employment of multiple trained deep learning (DL) models is not fully supported. In our previous work, a mobile application that can capture images and can recognize the objects contained in them was developed. It constructs descriptive sentences and speak them in Arabic and English languages. The notion of employing multi-trained DL models was used but no experimentation was conducted. In this article, we extend our previous work to perform required assessments while using multiple trained DL models. The main aim is to show that the deployment of multiple models approach can reduce the complexity of having one large compound model, and can enhance the prediction time. For this reason, we examine the prediction accuracy for single DL model-based recognition and multiple DL model-based recognition scenarios. The assessments results showed significant improvement in the prediction accuracy and in the prediction time. In the other hand, from the end user aspect, the application is designed primarily for visually impaired people to assist them in understanding their surroundings. In this context, we conduct a usability study to evaluate the usability of the proposed application with normal people and with visually impaired people. In fact, participants showed large interest in using the mobile application daily.
راعي المشروع	ACKNOWLEDGMENT This publication was supported by Qatar University Collaborative High Impact Grant QUHI-CENG-18/19-1. The findings achieved herein are solely the responsibility of the authors. The contents of this publication are solely the responsibility of the authors and do not necessarily represent the official views of the Qatar University.
اللغة	en
الناشر	Institute of Electrical and Electronics Engineers Inc.
الموضوع	Deep learning Internet of things Mobile computing Object recognition Convolution neural network English languages Mobile applications Model-based recognition Multiple models approaches Prediction accuracy Usability studies Visually impaired people Forecasting
العنوان	A Scene-to-Speech Mobile based Application: Multiple Trained Models Approach
النوع	Conference
الصفحات	490-497
dc.accessType	Abstract Only

الملفات في هذه التسجيلة

الملفات	الحجم	الصيغة	العرض
لا توجد ملفات لها صلة بهذه التسجيلة.

هذه التسجيلة تظهر في المجموعات التالية

علوم وهندسة الحاسب [‎2484‎ items ]

عرض بسيط للتسجيلة

A Scene-to-Speech Mobile based Application: Multiple Trained Models Approach

الملفات في هذه التسجيلة

هذه التسجيلة تظهر في المجموعات التالية

Video