Deep RL with Hierarchical Action Exploration for Dialogue Generation

Cho, Itsugun; Takahashi, Ryota; Yanase, Yusaku; Saito, Hiroaki

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2303

Computer Science > Computation and Language

Title: Deep RL with Hierarchical Action Exploration for Dialogue Generation

Authors: Itsugun Cho, Ryota Takahashi, Yusaku Yanase, Hiroaki Saito

(Submitted on 22 Mar 2023 (this version), latest version 15 May 2023 (v3))

Abstract: Conventionally, since the natural language action space is astronomical, approximate dynamic programming applied to dialogue generation involves policy improvement with action sampling. However, such a practice is inefficient for reinforcement learning (RL) because the eligible (high action value) responses are very sparse, and the greedy policy sustained by the random sampling is flabby. This paper shows that the performance of dialogue policy positively correlated with sampling size by theoretical and experimental. We introduce a novel dual-granularity Q-function to alleviate this limitation by exploring the most promising response category to intervene in the sampling. It extracts the actions following the grained hierarchy, which can achieve the optimum with fewer policy iterations. Our approach learns in the way of offline RL from multiple reward functions designed to recognize human emotional details. Empirical studies demonstrate that our algorithm outperforms the baseline methods. Further verification presents that ours can generate responses with higher expected rewards and controllability.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2303.13465 [cs.CL]
	(or arXiv:2303.13465v1 [cs.CL] for this version)

Submission history

From: Itsugun Cho [view email]
[v1] Wed, 22 Mar 2023 09:29:22 GMT (890kb,D)
[v2] Sat, 6 May 2023 08:16:00 GMT (2323kb,D)
[v3] Mon, 15 May 2023 08:04:18 GMT (2358kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2303.13465v1

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Deep RL with Hierarchical Action Exploration for Dialogue Generation

Submission history