IEEE 14th International Conference on Machine Learning and Applications ICMLA, Florida, United States Of America, 9 - 11 December 2015, pp.644-648
Twitter is one of the largest micro blogging web sites where users share news, their opinions, moods, recommendations by posting text messages, and it is mostly used like a news media. Since the data being shared via Twitter is vast, many researchers are focusing on extracting meaningful information with the help of information retrieval systems. Retrieving meaningful information from social media applications became important for several tasks such as sentiment analysis, detecting anomalies, and recommendation systems. Topic modeling is one of the mostly studied and hard problems in information retrieval area, and it is even more challenging to model topics when the documents are too short such as tweets. In this paper, we focus on developing an effective and efficient method to overcome this challenge of tweets being too short for topic modeling. We compare different topic modeling schemes, one of which is not studied before, based on Latent Dirichlet Allocation (LDA) that merges tweets in order to improve LDA performance. We also demonstrate our experimental results with unbiased data collection and evaluation methodologies.