We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:


Current browse context:


Change to browse by:


References & Citations

DBLP - CS Bibliography


(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Social and Information Networks

Title: Multi-class Twitter Data Categorization and Geocoding with a Novel Computing Framework

Abstract: Transportation data analysis is becoming a major area of computing application. Operation and management of transportation system have been transforming with the advancements in computing technology. This study presents such an advancement in transportation data analysis with a novel computing framework. This study presents the Labelled Latent Dirichlet Allocation (L-LDA)-incorporated Support Vector Machine (SVM) classifier with the supporting computing strategy for using publicly available Twitter data, which is 1% of the total twitter data, in determining transportation related events (i.e., incidents, congestions, special events, construction and other events) to provide reliable information to travelers. The analytical approach includes analyzing tweets using text classification and geocoding locations based on string similarity. A case study conducted for the New York City and its surrounding areas demonstrates the feasibility of the analytical approach. In total, almost 700,010 tweets are analyzed to extract relevant transportation related information for one week. For a large geographic area like New York City and its surrounding areas, using parallel computation, 30 times speedup is achieved compared to the sequential processing in analyzing transportation related tweets. The SVM classifier achieves more than 85% accuracy in identifying transportation-related tweets from structured data. To further categorize the transportation related tweets into sub-classes: incident, congestion, construction, special events, and other events, three supervised classifiers are used: L-LDA, SVM, and L-LDA incorporated SVM. The analytical framework, which uses the L-LDA incorporated SVM, can classify roadway transportation related data from Twitter with over 98.3% accuracy, which is significantly higher than the accuracies achieved by standalone L-LDA and SVM.
Subjects: Social and Information Networks (cs.SI)
Cite as: arXiv:1905.02916 [cs.SI]
  (or arXiv:1905.02916v1 [cs.SI] for this version)

Submission history

From: Sakib Khan [view email]
[v1] Wed, 8 May 2019 05:08:59 GMT (1397kb)
[v2] Thu, 18 Jul 2019 14:10:58 GMT (1273kb)
[v3] Wed, 28 Aug 2019 19:04:13 GMT (1339kb)

Link back to: arXiv, form interface, contact.