We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Computation and Language

Title: Enhancing Transformers with Gradient Boosted Decision Trees for NLI Fine-Tuning

Abstract: Transfer learning has become the dominant paradigm for many natural language processing tasks. In addition to models being pretrained on large datasets, they can be further trained on intermediate (supervised) tasks that are similar to the target task. For small Natural Language Inference (NLI) datasets, language modelling is typically followed by pretraining on a large (labelled) NLI dataset before fine-tuning with each NLI subtask. In this work, we explore Gradient Boosted Decision Trees (GBDTs) as an alternative to the commonly used Multi-Layer Perceptron (MLP) classification head. GBDTs have desirable properties such as good performance on dense, numerical features and are effective where the ratio of the number of samples w.r.t the number of features is low. We then introduce FreeGBDT, a method of fitting a GBDT head on the features computed during fine-tuning to increase performance without additional computation by the neural network. We demonstrate the effectiveness of our method on several NLI datasets using a strong baseline model (RoBERTa-large with MNLI pretraining). The FreeGBDT shows a consistent improvement over the MLP classification head.
Comments: Findings of ACL 2021
Subjects: Computation and Language (cs.CL)
Cite as: arXiv:2105.03791 [cs.CL]
  (or arXiv:2105.03791v2 [cs.CL] for this version)

Submission history

From: Benjamin Minixhofer [view email]
[v1] Sat, 8 May 2021 22:31:51 GMT (382kb,D)
[v2] Tue, 8 Jun 2021 14:35:26 GMT (386kb,D)

Link back to: arXiv, form interface, contact.