A No-Free-Lunch Theorem for MultiTask Learning

Hanneke, Steve; Kpotufe, Samory

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2006

Computer Science > Machine Learning

Title: A No-Free-Lunch Theorem for MultiTask Learning

Authors: Steve Hanneke, Samory Kpotufe

(Submitted on 29 Jun 2020 (v1), revised 23 Jul 2020 (this version, v3), latest version 5 Aug 2020 (v4))

Abstract: Multitask learning and related areas such as multi-source domain adaptation address modern settings where datasets from $N$ related distributions $\{P_t\}$ are to be combined towards improving performance on any single such distribution ${\cal D}$. A perplexing fact remains in the evolving theory on the subject: while we would hope for performance bounds that account for the contribution from multiple tasks, the vast majority of analyses result in bounds that improve at best in the number $n$ of samples per task, but most often do not improve in $N$. As such, it might seem at first that the distributional settings or aggregation procedures considered in such analyses might be somehow unfavorable; however, as we show, the picture happens to be more nuanced, with interestingly hard regimes that might appear otherwise favorable.
In particular, we consider a seemingly favorable classification scenario where all tasks $P_t$ share a common optimal classifier $h^*,$ and which can be shown to admit a broad range of regimes with improved oracle rates in terms of $N$ and $n$. Some of our main results are as follows:
$\bullet$ We show that, even though such regimes admit minimax rates accounting for both $n$ and $N$, no adaptive algorithm exists; that is, without access to distributional information, no algorithm can guarantee rates that improve with large $N$ for $n$ fixed.
$\bullet$ With a bit of additional information, namely, a ranking of tasks $\{P_t\}$ according to their distance to a target ${\cal D}$, a simple rank-based procedure can achieve near optimal aggregations of tasks' datasets, despite a search space exponential in $N$. Interestingly, the optimal aggregation might exclude certain tasks, even though they all share the same $h^*$.

Subjects:	Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
Cite as:	arXiv:2006.15785 [cs.LG]
	(or arXiv:2006.15785v3 [cs.LG] for this version)

Submission history

From: Steve Hanneke [view email]
[v1] Mon, 29 Jun 2020 03:03:29 GMT (156kb,D)
[v2] Mon, 13 Jul 2020 16:13:18 GMT (156kb,D)
[v3] Thu, 23 Jul 2020 17:43:21 GMT (156kb,D)
[v4] Wed, 5 Aug 2020 18:05:50 GMT (157kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2006.15785v3

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: A No-Free-Lunch Theorem for MultiTask Learning

Submission history