Current browse context:
stat.ML
Change to browse by:
References & Citations
Statistics > Machine Learning
Title: Reintroducing Straight-Through Estimators as Principled Methods for Stochastic Binary Networks
(Submitted on 11 Jun 2020 (this version), latest version 19 Oct 2021 (v4))
Abstract: Training neural networks with binary weights and activations is a challenging problem due to the lack of gradients and difficulty of optimization over discrete weights. Many successful experimental results have been recently achieved using the empirical straight-through estimation approach. This approach has generated a variety of ad-hoc rules for propagating gradients through non-differentiable activations and updating discrete weights. We put such methods on a solid basis by obtaining them as viable approximations in the stochastic binary network (SBN) model with Bernoulli weights. In this model gradients are well-defined and the weight probabilities can be optimized by continuous techniques. By choosing the activation noises in SBN appropriately and choosing mirror descent (MD) for optimization, we obtain methods that closely resemble several existing straight-through variants, but unlike them, all work reliably and produce equally good results. We further show that variational inference for Bayesian learning of Binary weights can be implemented using MD updates with the same simplicity.
Submission history
From: Alexander Shekhovtsov [view email][v1] Thu, 11 Jun 2020 23:58:18 GMT (370kb,D)
[v2] Tue, 2 Feb 2021 15:48:44 GMT (1488kb,D)
[v3] Thu, 7 Oct 2021 15:08:35 GMT (1553kb,D)
[v4] Tue, 19 Oct 2021 14:45:41 GMT (1633kb,D)
Link back to: arXiv, form interface, contact.