Motivation
Understanding group interactions is a multidisciplinary challenge at the intersection of artificial intelligence and communication science. In fields such as sociology and communication, researchers have long explored how collective states, including group rapport, shared attention, and team mood, arise from the interplay of individual behaviors. However, further research is required to develop computational models and sensing methods that can capture these complex group phenomena. Most artificial intelligence research in social signal processing has concentrated on individuals or pairs of people, while the dynamics of larger groups remain less understood. Because group behavior is non-additive and the state of a group cannot be reduced to the sum of its members, this gap presents an important opportunity for the multimodal interaction community to broaden its focus.
Recent progress in multimodal sensing, which combines audio, video, and behavioral data with artificial intelligence modeling, has enabled the detection of subtle social cues in group contexts. At the same time, new artificial intelligence agents, whether virtual or physical, are beginning to participate in team environments. This development raises significant questions: How can artificial intelligence systems accurately perceive the overall mood of a group and respond appropriately? Can an artificial intelligence facilitator improve group performance by monitoring engagement and participation? Advancing the underlying technologies for sensing and generation will be essential to address these questions and to strengthen collaboration between humans and artificial intelligence.
In summary, this workshop addresses the growing need to comprehend collective states, particularly as agentic artificial intelligence becomes increasingly integrated into social and collaborative settings. It aims to outline clear directions for future research and development to pursue these objectives.