DQI: A Guide to Benchmark Evaluation

Mishra, Swaroop; Arunkumar, Anjana; Sachdeva, Bhavdeep; Bryan, Chris; Baral, Chitta

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2008

Computer Science > Computation and Language

Title: DQI: A Guide to Benchmark Evaluation

Authors: Swaroop Mishra, Anjana Arunkumar, Bhavdeep Sachdeva, Chris Bryan, Chitta Baral

(Submitted on 10 Aug 2020)

Abstract: A `state of the art' model A surpasses humans in a benchmark B, but fails on similar benchmarks C, D, and E. What does B have that the other benchmarks do not? Recent research provides the answer: spurious bias. However, developing A to solve benchmarks B through E does not guarantee that it will solve future benchmarks. To progress towards a model that `truly learns' an underlying task, we need to quantify the differences between successive benchmarks, as opposed to existing binary and black-box approaches. We propose a novel approach to solve this underexplored task of quantifying benchmark quality by debuting a data quality metric: DQI.

Comments:	ICML UDL 2020
Subjects:	Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Systems and Control (eess.SY)
Cite as:	arXiv:2008.03964 [cs.CL]
	(or arXiv:2008.03964v1 [cs.CL] for this version)

Submission history

From: Swaroop Mishra [view email]
[v1] Mon, 10 Aug 2020 08:38:55 GMT (41334kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2008.03964

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: DQI: A Guide to Benchmark Evaluation

Submission history