We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:


Current browse context:


Change to browse by:

References & Citations


(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Statistics > Methodology

Title: Mining Events with Declassified Diplomatic Documents

Abstract: Since 1973 the State Department has been using electronic records systems to preserve classified communications. Recently, approximately 1.9 million of these records from 1973-77 have been made available by the U.S. National Archives. While some of these communication streams have periods witnessing an acceleration in the rate of transmission; others do not show any notable patterns in communication intensity. Given the sheer volume of these communications -- far greater than what had been available until now -- scholars need automated statistical techniques to identify the communications that warrant closer study. We develop a statistical framework that can semi-automatically identify from a large corpus of documents a handful that historians would consider more interesting electronic records. Our approach brings together related but distinct statistical concepts from nonparametric signal estimation and statistical hypothesis testing -- which when put together help us identify and analyze various geometrical aspects of the communication streams. Dominant periods of heightened and sustained activities aka bursts, as identified through these methods, correspond well with historical events recognized by standard reference works on the 1970s.
Subjects: Methodology (stat.ME); Applications (stat.AP)
Cite as: arXiv:1712.07319 [stat.ME]
  (or arXiv:1712.07319v1 [stat.ME] for this version)

Submission history

From: Yuanjun Gao [view email]
[v1] Wed, 20 Dec 2017 04:39:28 GMT (520kb,D)

Link back to: arXiv, form interface, contact.