We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

eess.AS

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computer Vision and Pattern Recognition

Title: AI in Pursuit of Happiness, Finding Only Sadness: Multi-Modal Facial Emotion Recognition Challenge

Authors: Carl Norman
Abstract: The importance of automated Facial Emotion Recognition (FER) grows the more common human-machine interactions become, which will only continue to increase dramatically with time. A common method to describe human sentiment or feeling is the categorical model the `7 basic emotions', consisting of `Angry', `Disgust', `Fear', `Happiness', `Sadness', `Surprise' and `Neutral'. The `Emotion Recognition in the Wild' (EmotiW) competition is now in its 7th year and has become the standard benchmark for measuring FER performance. The focus of this paper is the EmotiW sub-challenge of classifying videos in the `Acted Facial Expression in the Wild' (AFEW) dataset, consisting of both visual and audio modalities, into one of the above classes. Machine learning has exploded as a research topic in recent years, with advancements in `Deep Learning' a key part of this. Although Deep Learning techniques have been widely applied to the FER task by entrants in previous years, this paper has two main contributions: (i) to apply the latest `state-of-the-art' visual and temporal networks and (ii) exploring various methods of fusing features extracted from the visual and audio elements to enrich the information available to the final model making the prediction. There are a number of complex issues that arise when trying to classify emotions for `in-the-wild' video sequences, which the above two approaches attempt to directly address. There are some positive findings when comparing the results of this paper to past submissions, indicating that further research into the proposed methods and fine-tuning of the models deployed, could result in another step forwards in the field of automated FER.
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
Cite as: arXiv:1911.05187 [cs.CV]
  (or arXiv:1911.05187v1 [cs.CV] for this version)

Submission history

From: Carl Norman [view email]
[v1] Thu, 24 Oct 2019 14:49:39 GMT (3493kb,D)

Link back to: arXiv, form interface, contact.