References & Citations
Computer Science > Computation and Language
Title: Table Pretraining: A Survey on Model Architectures, Pretraining Objectives, and Downstream Tasks
(Submitted on 24 Jan 2022 (this version), latest version 29 Apr 2022 (v4))
Abstract: Since a vast number of tables can be easily collected from web pages, spreadsheets, PDFs, and various other document types, a flurry of table pretraining frameworks have been proposed following the success of text and images, and they have achieved new state-of-the-arts on various tasks such as table question answering, table type recognition, column relation classification, table search, formula prediction, etc. To fully use the supervision signals in unlabeled tables, a variety of pretraining objectives have been designed and evaluated, for example, denoising cell values, predicting numerical relationships, and implicitly executing SQLs. And to best leverage the characteristics of (semi-)structured tables, various tabular language models, particularly with specially-designed attention mechanisms, have been explored. Since tables usually appear and interact with free-form text, table pretraining usually takes the form of table-text joint pretraining, which attracts significant research interests from multiple domains. This survey aims to provide a comprehensive review of different model designs, pretraining objectives, and downstream tasks for table pretraining, and we share our thoughts and vision on existing challenges and future opportunities.
Submission history
From: Haoyu Dong [view email][v1] Mon, 24 Jan 2022 15:22:24 GMT (678kb,D)
[v2] Thu, 27 Jan 2022 03:41:50 GMT (680kb,D)
[v3] Tue, 19 Apr 2022 08:51:04 GMT (892kb,D)
[v4] Fri, 29 Apr 2022 13:19:56 GMT (898kb,D)
Link back to: arXiv, form interface, contact.