Current browse context:
cs.CV
Change to browse by:
References & Citations
Computer Science > Computer Vision and Pattern Recognition
Title: FiLM: Visual Reasoning with a General Conditioning Layer
(Submitted on 22 Sep 2017 (v1), last revised 18 Dec 2017 (this version, v2))
Abstract: We introduce a general-purpose conditioning method for neural networks called FiLM: Feature-wise Linear Modulation. FiLM layers influence neural network computation via a simple, feature-wise affine transformation based on conditioning information. We show that FiLM layers are highly effective for visual reasoning - answering image-related questions which require a multi-step, high-level process - a task which has proven difficult for standard deep learning methods that do not explicitly model reasoning. Specifically, we show on visual reasoning tasks that FiLM layers 1) halve state-of-the-art error for the CLEVR benchmark, 2) modulate features in a coherent manner, 3) are robust to ablations and architectural modifications, and 4) generalize well to challenging, new data from few examples or even zero-shot.
Submission history
From: Ethan Perez [view email][v1] Fri, 22 Sep 2017 17:54:12 GMT (6311kb,D)
[v2] Mon, 18 Dec 2017 21:25:53 GMT (6310kb,D)
Link back to: arXiv, form interface, contact.