VisualAid+: Assistive System for Visually Impaired with TinyML Enhanced Object Detection and Scene Narration

Kunhoth, Jayakanth; Alkaeed, Mahdi; Ehsan, Adeel; Qadir, Junaid

View/Open

VisualAid_Assistive_System_for_Visually_Impaired_with_TinyML_Enhanced_Object_Detection_and_Scene_Narration.pdf (3.230Mb)

Date

2023

Author

Kunhoth, Jayakanth
Alkaeed, Mahdi
Ehsan, Adeel
Qadir, Junaid

Metadata

Show full item record

Abstract

People with visual impairments use different kinds of assistive technologies in their daily lives for various activities such as navigation, reading texts, etc. Technological advancements in recent years have enabled developers to actively deploy and expeditiously operate assistive applications in embedded devices. In this paper, we have proposed an assistive wearable system called VisualAid+ for people with visual impairments. Leveraging the power of Tiny ML and Edge AI, a portable wearable assistive system is developed for people with visual impairments for object detection and visual scene narration. A hierarchical approach has been followed to take advantage of the complex prediction model (TensorFlow model), and Lite models (TensorFlow Lite and TensorFlow Lite Micro). This makes the system capable of on-device, as well as server-side, inference. The proposed VisualAid+ system consists of a Raspberry Pi device, a computer as a server, a power source, and two cameras mounted on a wearable glass where one camera is embedded with an ESP32 microcontroller. The camera will capture the scenes in front of the user and transfer the images to ESP32 as well as Raspberry Pi. A TensorFlow Lite Micro person detection model is employed in the microcontroller and a Lite object detection model is employed in the Raspberry Pi. These two models are efficient in terms of both memory and processing time. The person detection model will look for whether any person is present in front of the user or not. If any person is detected, it will notify the user via audio feedback. The object detection model can recognize 80 different types of objects from the images and speak out the names of detected objects. Furthermore, the system will provide the audio narration of visual scenes (image captioning to speech) to the user. The image narration model is employed in the server and requires internet connectivity.

DOI/handle

http://dx.doi.org/10.1109/ISNCC58260.2023.10323988
http://hdl.handle.net/10576/66059

Collections

Computer Science & Engineering [‎2482‎ items ]