We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: Vector Space Model as Cognitive Space for Text Classification

Abstract: In this era of digitization, knowing the user's sociolect aspects have become essential features to build the user specific recommendation systems. These sociolect aspects could be found by mining the user's language sharing in the form of text in social media and reviews. This paper describes about the experiment that was performed in PAN Author Profiling 2017 shared task. The objective of the task is to find the sociolect aspects of the users from their tweets. The sociolect aspects considered in this experiment are user's gender and native language information. Here user's tweets written in a different language from their native language are represented as Document - Term Matrix with document frequency as the constraint. Further classification is done using the Support Vector Machine by taking gender and native language as target classes. This experiment attains the average accuracy of 73.42% in gender prediction and 76.26% in the native language identification task.
Comments: 6 pages, 6 figures, 3 tables
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
MSC classes: 68T50
Cite as: arXiv:1708.06068 [cs.CL]
  (or arXiv:1708.06068v1 [cs.CL] for this version)

Submission history

From: Barathi Ganesh H B [view email]
[v1] Mon, 21 Aug 2017 03:06:07 GMT (268kb,D)

Link back to: arXiv, form interface, contact.