Example frames of four emotion categories from the dataset we collected.
User-generated video collections are expanding rapidly in recent years, and systems for automatic analysis of these collections are in high demands. While extensive research efforts have been devoted to recognizing semantics like "birthday party" and "skiing", little attempts have been made to understand the emotions carried by the videos, e.g., "joy" and "sadness". In this work, we propose a comprehensive computational framework for predicting emotions in user-generated videos. We introduce a rigorously designed dataset collected from popular video-sharing websites with manual annotations, which can serve as a valuable benchmark for future research. A large set of features are extracted from this dataset, ranging from popular low-level visual descriptors, audio features, to high-level semantic attributes. Results of a comprehensive set of experiments indicate that combining multiple types of features, such as the joint use of the audio and visual clues, is important, and attribute features such as those containing sentiment-level semantics are very effective.
Yu-Gang Jiang, Baohan Xu, Xiangyang Xue Predicting Emotions in User-Generated Videos, The 28th AAAI Conference on Artificial Intelligence (AAAI), Quebec City, Canada, Jul. 2014.
Since there is no public data for predicting emotions in user-generated videos, we constructed and released a benchmark dataset. Eight emotion categories are considered according to the well-known Plutchik's wheel of emotions. Videos were downloaded from
both Flickr and YouTube, which were then filtered manually. The table on the right summarizes the final dataset. ![]() Note: People who download this dataset must agree that 1) the use of the data is restricted to research purpose only, and that 2) The authors of the above AAAI'14 paper, and the Fudan University, make no warranties regarding this dataset, such as (not limited to) non-infringement. |
![]() |