Dettaglio pubblicazione

2024, AVI '24: Proceedings of the 2024 International Conference on Advanced Visual Interfaces, Pages 1-5

VQAsk: a multimodal Android GPT-based application to help blind users visualize pictures (04b Atto di convegno in volume)

De Marsico Maria, Giacanelli Chiara, Manganaro Clizia Giorgia, Palma Alessio, Santoro Davide

VQAsk is an Android application that helps visually impaired users to get information about images framed by their smartphones. It enables to interact with one’s photographs or the surrounding visual environment through a question-and-answer interface integrating three modalities: speech interaction, haptic feedback that facilitates navigation and interaction, and sight. VQAsk is primarily designed to help visually impaired users mentally visualize what they cannot see, but it can also accommodate users with varying levels of visual ability. To this aim, it embeds advanced NLP and Computer Vision techniques to answer all user questions about the image on the cell screen. Image processing is enhanced by background removal through advanced segmentation models that identify important image elements. The outcomes of a testing phase confirmed the importance of this project as a first attempt at using AI-supported multimodality to enhance visually impaired users’ experience.

DOI: 10.1145/3656650.3656677

ISBN: 9798400717642

keywords