مرکز منطقه ای اطلاع رساني علوم و فناوري - Keynote speech 4: Extraction of linguistic and paralinguistic information from audio-visual data

Abstract :

Audio-visual data have been a key enabler of human observational research and practice. The confluence of sensing, communication and computing technologies is allowing capture and access to data, in diverse forms and modalities, in ways that were unimaginable even a few years ago. Importantly, these data afford the analysis and interpretation of multimodal cues of verbal and non-verbal human behavior. These signals carry crucial information about not only a person´s intent and identity but also underlying attitudes and emotions. Automatically capturing these cues, although vastly challenging, offers the promise of not just efficient data processing but in tools for discovery that enable hitherto unimagined insights. Recent computational approaches that have leveraged judicious use of both data and knowledge have yielded significant advances in this regards, for example in deriving rich information from multimodal sources including human speech, language, and videos of visual behavior. This talk will focus on some of the advances and challenges in gathering such data and creating algorithms for machine processing of such cues. It will also introduce some of the freely available data resources for research.