We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.SI

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Social and Information Networks

Title: The Readability of Tweets and their Geographic Correlation with Education

Abstract: Twitter has rapidly emerged as one of the largest worldwide venues for written communication. Thanks to the ease with which vast quantities of tweets can be mined, Twitter has also become a source for studying modern linguistic style. The readability of text has long provided a simple method to characterize the complexity of language and ease that documents may be understood by readers. In this note we use a modified version of the Flesch Reading Ease formula, applied to a corpus of 17.4 million tweets. We find tweets have characteristically more difficult readability scores compared to other short format communication, such as SMS or chat. This linguistic difference is insensitive to the presence of "hashtags" within tweets. By utilizing geographic data provided by 2% of users, joined with "ZIP Code Tabulation Area" (ZCTA) level education data from the U.S. Census, we find an intriguing correlation between the average readability and the college graduation rate within a ZCTA. This points towards a difference in either the underlying language, or a change in the type of content being tweeted in these areas
Comments: 4 page note
Subjects: Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)
Cite as: arXiv:1401.6058 [cs.SI]
  (or arXiv:1401.6058v1 [cs.SI] for this version)

Submission history

From: James RA Davenport [view email]
[v1] Thu, 23 Jan 2014 17:15:56 GMT (2894kb)

Link back to: arXiv, form interface, contact.