FiLM: Visual Reasoning with a General Conditioning Layer

Perez, Ethan; Strub, Florian; de Vries, Harm; Dumoulin, Vincent; Courville, Aaron

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 1709

Computer Science > Computer Vision and Pattern Recognition

Title: FiLM: Visual Reasoning with a General Conditioning Layer

Authors: Ethan Perez, Florian Strub, Harm de Vries, Vincent Dumoulin, Aaron Courville

(Submitted on 22 Sep 2017 (v1), last revised 18 Dec 2017 (this version, v2))

Abstract: We introduce a general-purpose conditioning method for neural networks called FiLM: Feature-wise Linear Modulation. FiLM layers influence neural network computation via a simple, feature-wise affine transformation based on conditioning information. We show that FiLM layers are highly effective for visual reasoning - answering image-related questions which require a multi-step, high-level process - a task which has proven difficult for standard deep learning methods that do not explicitly model reasoning. Specifically, we show on visual reasoning tasks that FiLM layers 1) halve state-of-the-art error for the CLEVR benchmark, 2) modulate features in a coherent manner, 3) are robust to ablations and architectural modifications, and 4) generalize well to challenging, new data from few examples or even zero-shot.

Comments:	AAAI 2018. Code available at this http URL . Extends arXiv:1707.03017
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (stat.ML)
Cite as:	arXiv:1709.07871 [cs.CV]
	(or arXiv:1709.07871v2 [cs.CV] for this version)

Submission history

From: Ethan Perez [view email]
[v1] Fri, 22 Sep 2017 17:54:12 GMT (6311kb,D)
[v2] Mon, 18 Dec 2017 21:25:53 GMT (6310kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:1709.07871

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: FiLM: Visual Reasoning with a General Conditioning Layer

Submission history