We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: Classifying Vietnamese Disease Outbreak Reports with Important Sentences and Rich Features

Abstract: Text classification is an important field of research from mid 90s up to now. It has many applications, one of them is in Web-based biosurveillance systems which identify and summarize online disease outbreak reports. In this paper we focus on classifying Vietnamese disease outbreak reports. We investigate important properties of disease outbreak reports, e.g., sentences containing names of outbreak disease, locations. Evaluation on 10-time 10- fold cross-validation using the Support Vector Machine algorithm shows that using sentences containing disease outbreak names with its preceding/following sentences in combination with location features achieve the best F-score with 86.67% - an improvement of 0.38% in comparison to using all raw text. Our results suggest that using important sentences and rich feature can improve performance of Vietnamese disease outbreak text classification.
Comments: 5 pages, 2 tables
Subjects: Computation and Language (cs.CL)
Journal reference: Proc. of the Third Symposium on Information and Communication Technology (SoICT), pages 260-265, 2012
DOI: 10.1145/2350716.2350755
Cite as: arXiv:1911.09883 [cs.CL]
  (or arXiv:1911.09883v1 [cs.CL] for this version)

Submission history

From: Son Doan [view email]
[v1] Fri, 22 Nov 2019 06:46:11 GMT (212kb)

Link back to: arXiv, form interface, contact.