Reintroducing Straight-Through Estimators as Principled Methods for Stochastic Binary Networks

Yanush, Viktor; Shekhovtsov, Alexander; Molchanov, Dmitry; Vetrov, Dmitry

Full-text links:

Download:

Current browse context:

stat.ML

< prev | next >

new | recent | 2006

Statistics > Machine Learning

Title: Reintroducing Straight-Through Estimators as Principled Methods for Stochastic Binary Networks

Authors: Viktor Yanush, Alexander Shekhovtsov, Dmitry Molchanov, Dmitry Vetrov

(Submitted on 11 Jun 2020 (this version), latest version 19 Oct 2021 (v4))

Abstract: Training neural networks with binary weights and activations is a challenging problem due to the lack of gradients and difficulty of optimization over discrete weights. Many successful experimental results have been recently achieved using the empirical straight-through estimation approach. This approach has generated a variety of ad-hoc rules for propagating gradients through non-differentiable activations and updating discrete weights. We put such methods on a solid basis by obtaining them as viable approximations in the stochastic binary network (SBN) model with Bernoulli weights. In this model gradients are well-defined and the weight probabilities can be optimized by continuous techniques. By choosing the activation noises in SBN appropriately and choosing mirror descent (MD) for optimization, we obtain methods that closely resemble several existing straight-through variants, but unlike them, all work reliably and produce equally good results. We further show that variational inference for Bayesian learning of Binary weights can be implemented using MD updates with the same simplicity.

Subjects:	Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Cite as:	arXiv:2006.06880 [stat.ML]
	(or arXiv:2006.06880v1 [stat.ML] for this version)

Submission history

From: Alexander Shekhovtsov [view email]
[v1] Thu, 11 Jun 2020 23:58:18 GMT (370kb,D)
[v2] Tue, 2 Feb 2021 15:48:44 GMT (1488kb,D)
[v3] Thu, 7 Oct 2021 15:08:35 GMT (1553kb,D)
[v4] Tue, 19 Oct 2021 14:45:41 GMT (1633kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> stat > arXiv:2006.06880v1

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Statistics > Machine Learning

Title: Reintroducing Straight-Through Estimators as Principled Methods for Stochastic Binary Networks

Submission history