A Bilingual Scene-To-Speech Mobile Based Application
Abstract
Scene-To-Speech (STS) is the process of recognizing visual objects in a picture or a video to say aloud a descriptive text that represents the scene. The recent advancement in convolution neural network (CNN), a deep learning feed-forward artificial neural network, enables us to recognize objects on mobile handled devices in real-Time. Several applications have been developed to recognize objects in scenes and speak loud their relevant descriptions. However, the Arabic language is not fully supported. In this paper, we propose a bilingual mobile based application that captures video scenes and processes their content to recognize objects in real-Time. The mobile application will then speak loud, in English or Arabic language, the description of the captured scene. The mobile application can be extended to further support eLearning technologies and edutainment games. People with visual impairments (VI), such as people with low vision and totally blind people, can benefit from the application to know about their surroundings. We conducted an elementary study about the usage of the mobile application with people with VI and they expressed their interest to use it in their daily lives.
Collections
- Computer Science & Engineering [2402 items ]