We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.LG

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Machine Learning

Title: Efficient Detection and Filtering Systems for Distributed Training

Abstract: A plethora of modern machine learning tasks requires the utilization of large-scale distributed clusters as a critical component of the training pipeline. However, abnormal Byzantine behavior of the worker nodes can derail the training and compromise the quality of the inference. Such behavior can be attributed to unintentional system malfunctions or orchestrated attacks; as a result, some nodes may return arbitrary results to the parameter server (PS) that coordinates the training. Recent work considers a wide range of attack models and has explored robust aggregation and/or computational redundancy to correct the distorted gradients.
In this work, we consider attack models ranging from strong ones: $q$ omniscient adversaries with full knowledge of the defense protocol that can change from iteration to iteration to weak ones: $q$ randomly chosen adversaries with limited collusion abilities that only change every few iterations at a time. Our algorithms rely on redundant task assignments coupled with detection of adversarial behavior. For strong attacks, we demonstrate a reduction in the fraction of distorted gradients ranging from 16%-99% as compared to the prior state-of-the-art. Our top-1 classification accuracy results on the CIFAR-10 data set demonstrate a 25% advantage in accuracy (averaged over strong and weak scenarios) under the most sophisticated attacks compared to state-of-the-art methods.
Comments: 18 pages, 14 figures, 6 tables. The material in this work appeared in part at arXiv:2108.02416 which has been published at the 2022 IEEE International Symposium on Information Theory
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Information Theory (cs.IT)
Cite as: arXiv:2208.08085 [cs.LG]
  (or arXiv:2208.08085v3 [cs.LG] for this version)

Submission history

From: Konstantinos Konstantinidis [view email]
[v1] Wed, 17 Aug 2022 05:49:52 GMT (584kb,D)
[v2] Thu, 18 Aug 2022 01:31:58 GMT (583kb,D)
[v3] Mon, 22 Aug 2022 22:27:13 GMT (226kb,D)

Link back to: arXiv, form interface, contact.