We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: Principled Paraphrase Generation with Parallel Corpora

Abstract: Round-trip Machine Translation (MT) is a popular choice for paraphrase generation, which leverages readily available parallel corpora for supervision. In this paper, we formalize the implicit similarity function induced by this approach, and show that it is susceptible to non-paraphrase pairs sharing a single ambiguous translation. Based on these insights, we design an alternative similarity metric that mitigates this issue by requiring the entire translation distribution to match, and implement a relaxation of it through the Information Bottleneck method. Our approach incorporates an adversarial term into MT training in order to learn representations that encode as much information about the reference translation as possible, while keeping as little information about the input as possible. Paraphrases can be generated by decoding back to the source from this representation, without having to generate pivot translations. In addition to being more principled and efficient than round-trip MT, our approach offers an adjustable parameter to control the fidelity-diversity trade-off, and obtains better results in our experiments.
Subjects: Computation and Language (cs.CL)
Cite as: arXiv:2205.12213 [cs.CL]
  (or arXiv:2205.12213v3 [cs.CL] for this version)

Submission history

From: Aitor Ormazabal [view email]
[v1] Tue, 24 May 2022 17:22:42 GMT (177kb,D)
[v2] Mon, 22 May 2023 16:32:56 GMT (582kb,D)
[v3] Tue, 23 May 2023 06:42:36 GMT (177kb,D)

Link back to: arXiv, form interface, contact.