[FVL Research] Predicting Emotions in User-Generated Videos

Overview

User-generated video collections are expanding rapidly in recent years, and systems for automatic analysis of these collections are in high demands. While extensive research efforts have been devoted to recognizing semantics like "birthday party" and "skiing", little attempts have been made to understand the emotions carried by the videos, e.g., "joy" and "sadness". In this work, we propose a comprehensive computational framework for predicting emotions in user-generated videos. We introduce a rigorously designed dataset collected from popular video-sharing websites with manual annotations, which can serve as a valuable benchmark for future research. A large set of features are extracted from this dataset, ranging from popular low-level visual descriptors, audio features, to high-level semantic attributes. Results of a comprehensive set of experiments indicate that combining multiple types of features, such as the joint use of the audio and visual clues, is important, and attribute features such as those containing sentiment-level semantics are very effective.

Related Publication:

Yu-Gang Jiang, Baohan Xu, Xiangyang Xue Predicting Emotions in User-Generated Videos, The 28th AAAI Conference on Artificial Intelligence (AAAI), Quebec City, Canada, Jul. 2014.

Dataset

Since there is no public data for predicting emotions in user-generated videos, we constructed and released a benchmark dataset. Eight emotion categories are considered according to the well-known Plutchik's wheel of emotions. Videos were downloaded from both Flickr and YouTube, which were then filtered manually. The table on the right summarizes the final dataset.

Click here to download the dataset with all the features used in the paper (~11GB in total).
Note: People who download this dataset must agree that 1) the use of the data is restricted to research purpose only, and that 2) The authors of the above AAAI'14 paper, and the Fudan University, make no warranties regarding this dataset, such as (not limited to) non-infringement.

Computational Approach

We designed a comprehensive computational system to predict emotions in videos. The framework of the system and several results are shown below. More details can be found in our AAAI 2014 paper.