Exploring Low-Cost Transformer Model Compression for Large-Scale Commercial Reply Suggestions

Shrivastava, Vaishnavi; Gaonkar, Radhika; Gupta, Shashank; Jha, Abhishek

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2111

Change to browse by:

Computer Science > Computation and Language

Title: Exploring Low-Cost Transformer Model Compression for Large-Scale Commercial Reply Suggestions

Authors: Vaishnavi Shrivastava, Radhika Gaonkar, Shashank Gupta, Abhishek Jha

(Submitted on 27 Nov 2021)

Abstract: Fine-tuning pre-trained language models improves the quality of commercial reply suggestion systems, but at the cost of unsustainable training times. Popular training time reduction approaches are resource intensive, thus we explore low-cost model compression techniques like Layer Dropping and Layer Freezing. We demonstrate the efficacy of these techniques in large-data scenarios, enabling the training time reduction for a commercial email reply suggestion system by 42%, without affecting the model relevance or user engagement. We further study the robustness of these techniques to pre-trained model and dataset size ablation, and share several insights and recommendations for commercial applications.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2111.13999 [cs.CL]
	(or arXiv:2111.13999v1 [cs.CL] for this version)

Submission history

From: Vaishnavi Shrivastava [view email]
[v1] Sat, 27 Nov 2021 22:42:06 GMT (130kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2111.13999

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Exploring Low-Cost Transformer Model Compression for Large-Scale Commercial Reply Suggestions

Submission history