We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: Demystify Optimization Challenges in Multilingual Transformers

Abstract: Multilingual Transformer improves parameter efficiency and crosslingual transfer. How to effectively train multilingual models has not been well studied. Using multilingual machine translation as a testbed, we study optimization challenges from loss landscape and parameter plasticity perspectives. We found that imbalanced training data poses task interference between high and low resource languages, characterized by nearly orthogonal gradients for major parameters and the optimization trajectory being mostly dominated by high resource. We show that local curvature of the loss surface affects the degree of interference, and existing heuristics of data subsampling implicitly reduces the sharpness, although still face a trade-off between high and low resource languages. We propose a principled multi-objective optimization algorithm, Curvature Aware Task Scaling (CATS), which improves both optimization and generalization especially for low resource. Experiments on TED, WMT and OPUS-100 benchmarks demonstrate that CATS advances the Pareto front of accuracy while being efficient to apply to massive multilingual settings at the scale of 100 languages.
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as: arXiv:2104.07639 [cs.CL]
  (or arXiv:2104.07639v1 [cs.CL] for this version)

Submission history

From: Xian Li [view email]
[v1] Thu, 15 Apr 2021 17:51:03 GMT (1010kb,D)
[v2] Tue, 20 Apr 2021 06:40:27 GMT (1010kb,D)
[v3] Sun, 13 Jun 2021 00:05:27 GMT (1054kb,D)
[v4] Tue, 30 Nov 2021 05:57:50 GMT (1055kb,D)

Link back to: arXiv, form interface, contact.