The TechQA Dataset

Castelli, Vittorio; Chakravarti, Rishav; Dana, Saswati; Ferritto, Anthony; Florian, Radu; Franz, Martin; Garg, Dinesh; Khandelwal, Dinesh; McCarley, Scott; McCawley, Mike; Nasr, Mohamed; Pan, Lin; Pendus, Cezar; Pitrelli, John; Pujar, Saurabh; Roukos, Salim; Sakrajda, Andrzej; Sil, Avirup; Uceda-Sosa, Rosario; Ward, Todd; Zhang, Rong

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 1911

Computer Science > Computation and Language

Title: The TechQA Dataset

(Submitted on 8 Nov 2019)

Abstract: We introduce TechQA, a domain-adaptation question answering dataset for the technical support domain. The TechQA corpus highlights two real-world issues from the automated customer support domain. First, it contains actual questions posed by users on a technical forum, rather than questions generated specifically for a competition or a task. Second, it has a real-world size -- 600 training, 310 dev, and 490 evaluation question/answer pairs -- thus reflecting the cost of creating large labeled datasets with actual data. Consequently, TechQA is meant to stimulate research in domain adaptation rather than being a resource to build QA systems from scratch. The dataset was obtained by crawling the IBM Developer and IBM DeveloperWorks forums for questions with accepted answers that appear in a published IBM Technote---a technical document that addresses a specific technical issue. We also release a collection of the 801,998 publicly available Technotes as of April 4, 2019 as a companion resource that might be used for pretraining, to learn representations of the IT domain language.

Comments:	Long version of conference paper to be submitted
Subjects:	Computation and Language (cs.CL); Information Retrieval (cs.IR)
Cite as:	arXiv:1911.02984 [cs.CL]
	(or arXiv:1911.02984v1 [cs.CL] for this version)

Submission history

From: Vittorio Castelli [view email]
[v1] Fri, 8 Nov 2019 02:04:39 GMT (471kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:1911.02984

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: The TechQA Dataset

Submission history