We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.IR

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Information Retrieval

Title: Towards Query Logs for Privacy Studies: On Deriving Search Queries from Questions

Abstract: Translating verbose information needs into crisp search queries is a phenomenon that is ubiquitous but hardly understood. Insights into this process could be valuable in several applications, including synthesizing large privacy-friendly query logs from public Web sources which are readily available to the academic research community. In this work, we take a step towards understanding query formulation by tapping into the rich potential of community question answering (CQA) forums. Specifically, we sample natural language (NL) questions spanning diverse themes from the Stack Exchange platform, and conduct a large-scale conversion experiment where crowdworkers submit search queries they would use when looking for equivalent information. We provide a careful analysis of this data, accounting for possible sources of bias during conversion, along with insights into user-specific linguistic patterns and search behaviors. We release a dataset of 7,000 question-query pairs from this study to facilitate further research on query understanding.
Comments: ECIR 2020 Short Paper
Subjects: Information Retrieval (cs.IR)
DOI: 10.1007/978-3-030-45442-5_14
Cite as: arXiv:2004.02023 [cs.IR]
  (or arXiv:2004.02023v3 [cs.IR] for this version)

Submission history

From: Rishiraj Saha Roy [view email]
[v1] Sat, 4 Apr 2020 21:24:52 GMT (763kb,D)
[v2] Tue, 28 Apr 2020 18:53:21 GMT (763kb,D)
[v3] Thu, 3 Jun 2021 10:16:23 GMT (764kb,D)

Link back to: arXiv, form interface, contact.