We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Computation and Language

Title: Deep Learning based Topic Analysis on Financial Emerging Event Tweets

Abstract: Financial analyses of stock markets rely heavily on quantitative approaches in an attempt to predict subsequent or market movements based on historical prices and other measurable metrics. These quantitative analyses might have missed out on un-quantifiable aspects like sentiment and speculation that also impact the market. Analyzing vast amounts of qualitative text data to understand public opinion on social media platform is one approach to address this gap. This work carried out topic analysis on 28264 financial tweets [1] via clustering to discover emerging events in the stock market. Three main topics were discovered to be discussed frequently within the period. First, the financial ratio EPS is a measure that has been discussed frequently by investors. Secondly, short selling of shares were discussed heavily, it was often mentioned together with Morgan Stanley. Thirdly, oil and energy sectors were often discussed together with policy. These tweets were semantically clustered by a method consisting of word2vec algorithm to obtain word embeddings that map words to vectors. Semantic word clusters were then formed. Each tweet was then vectorized using the Term Frequency-Inverse Document Frequency (TF-IDF) values of the words it consisted of and based on which clusters its words were in. Tweet vectors were then converted to compressed representations by training a deep-autoencoder. K-means clusters were then formed. This method reduces dimensionality and produces dense vectors, in contrast to the usual Vector Space Model. Topic modelling with Latent Dirichlet Allocation (LDA) and top frequent words were used to analyze clusters and reveal emerging events.
Subjects: Computation and Language (cs.CL)
Cite as: arXiv:2008.00670 [cs.CL]
  (or arXiv:2008.00670v1 [cs.CL] for this version)

Submission history

From: Yok Yen Nguwi [view email]
[v1] Mon, 3 Aug 2020 06:43:11 GMT (340kb)

Link back to: arXiv, form interface, contact.