Sayer: Using Implicit Feedback to Optimize System Policies

Lécuyer, Mathias; Kim, Sang Hoon; Nanavati, Mihir; Jiang, Junchen; Sen, Siddhartha; Sharma, Amit; Slivkins, Aleksandrs

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2110

Computer Science > Machine Learning

Title: Sayer: Using Implicit Feedback to Optimize System Policies

Authors: Mathias Lécuyer, Sang Hoon Kim, Mihir Nanavati, Junchen Jiang, Siddhartha Sen, Amit Sharma, Aleksandrs Slivkins

(Submitted on 28 Oct 2021)

Abstract: We observe that many system policies that make threshold decisions involving a resource (e.g., time, memory, cores) naturally reveal additional, or implicit feedback. For example, if a system waits X min for an event to occur, then it automatically learns what would have happened if it waited <X min, because time has a cumulative property. This feedback tells us about alternative decisions, and can be used to improve the system policy. However, leveraging implicit feedback is difficult because it tends to be one-sided or incomplete, and may depend on the outcome of the event. As a result, existing practices for using feedback, such as simply incorporating it into a data-driven model, suffer from bias.
We develop a methodology, called Sayer, that leverages implicit feedback to evaluate and train new system policies. Sayer builds on two ideas from reinforcement learning -- randomized exploration and unbiased counterfactual estimators -- to leverage data collected by an existing policy to estimate the performance of new candidate policies, without actually deploying those policies. Sayer uses implicit exploration and implicit data augmentation to generate implicit feedback in an unbiased form, which is then used by an implicit counterfactual estimator to evaluate and train new policies. The key idea underlying these techniques is to assign implicit probabilities to decisions that are not actually taken but whose feedback can be inferred; these probabilities are carefully calculated to ensure statistical unbiasedness. We apply Sayer to two production scenarios in Azure, and show that it can evaluate arbitrary policies accurately, and train new policies that outperform the production policies.

Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2110.14874 [cs.LG]
	(or arXiv:2110.14874v1 [cs.LG] for this version)

Submission history

From: Mathias Lecuyer [view email]
[v1] Thu, 28 Oct 2021 04:16:56 GMT (7180kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2110.14874v1

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Sayer: Using Implicit Feedback to Optimize System Policies

Submission history