Digitalizing the visible world: Achieving distributed perception using a mobile with an AI app

Research output: Contribution to conference

Knowledge of objects in spatial environments is obtained through multisensorial resources, such as sight, hearing, touch and smell (Mondada, 2019). For most people the visual sense takes priority in the perception of objects’ spatial relation to the sensing body and the social world (Workman, 2016). We mostly use our eyes to unnoticeably perceive objects and their position in the environment (e.g. Goodwin & Goodwin, 1996). Visually impaired people (VIP), however, have no or very limited access to the visual aspects of these spatial relations, when orienting towards their immediate surroundings. The emergence of computer vision and natural language processing (NLP) in accessible mainstream technology, such as smartphones, enables the device to “translate” the visual world into language descriptions.

We explore in this paper the use of the app Microsoft SeeingAI which enables VIP to receive computerized descriptions of “visual” information about objects, people and places by scanning the environment - like using a flashlight to receive information about objects in the dark. Based on a video ethnographic collection of VIP scanning the shelfs when grocery shopping, we investigate how this ‘digitalizing’ of an everyday practice is done in situ. Through ethnomethodological multimodal conversation analysis (Streeck et al., 2011) we investigate environment scanning as a specific aspect of distributed perception (Due, 2021a), that is focusing on the co-operative actions between sense-able agents. As VIP rely on other multisensorial resources than sight alone, the question is how the practice of scanning is achieved in situ. Our analysis shows, that the practice of scanning inanimate distant objects require complex coordination of the device used to scan in relation to the object(s) being scanned with regards to both distance, angle of the device, and the location of the relevant feature of the object. This paper focuses on three particular phenomena: i) scanning nearby surroundings for the location and identification of objects, ii) scanning specific objects to obtain information about the object, and iii) scanning text on the object. This paper contributes to research into blind and visually impaired people, the senses and perception in interaction, shopping activities (Due, 2017) and interactions with non-human robotic agents (cf. Due, 2021b).

Original languageEnglish
Publication date2021
Publication statusPublished - 2021
EventExploring Social Interaction (ESI)
: MOVIN 25 years
- Online, Denmark
Duration: 23 Jun 202125 Jun 2021


ConferenceExploring Social Interaction (ESI)
